SlideShare a Scribd company logo
1 of 42
Download to read offline
Distributed Database Systems                                                                            Distributed Database Systems




                                                                                                              Contents I
                                                                                                        1    Motivation

                               Distributed Database Systems                                             2    Detour on centralized query processing
                                                                                                               Translating SQL into relational algebra
                                  Distributed Query Processing
                                                                                                               Phases of centralized query processing
                                                                                                               Query parsing
                                   Katja Hose, Ralf Schenkel                                                   Query transformation
                                                                                                               Query optimization
                    Max-Planck-Institut f¨r Informatik, Cluster of Excellence MMCI
                                         u                                                              3    Basics of distributed query processing
                                                                                                               Phases of distributed query processing
                                      November 10, 2011                                                        Introduction
                                      November 17, 2011                                                        Meta data management
                                                                                                               Data localization
                                                                                                        4    Global query optimization
                                                                                                               Main questions
            Katja Hose                 Distributed Database Systems       November 10, 2011   1 / 167               Katja Hose          Distributed Database Systems   November 10, 2011   2 / 167




Distributed Database Systems                                                                            Distributed Database Systems
                                                                                                          Motivation



      Contents II                                                                                             Motivation
         Global query optimizer
         Distributed cost model                                                                         The task of query processing is . . .
         Join order optimization                                                                        . . . to answer user queries
         Total time models
         Response time models                                                                           Example
                                                                                                                How many students are at Saarland University?
                                                                                                                Answer: 18.000
                                                                                                        Additional constraints
5    Summary                                                                                                    Low response times
                                                                                                                High query throughput
                                                                                                                Efficient hardware usage
                                                                                                                ...

            Katja Hose                 Distributed Database Systems       November 10, 2011   3 / 167               Katja Hose          Distributed Database Systems   November 10, 2011   4 / 167
Distributed Database Systems                                                                            Distributed Database Systems
  Motivation                                                                                              Detour on centralized query processing



      Motivation                                                                                        1       Motivation
                                                                                                        2       Detour on centralized query processing
                                                                                                                  Translating SQL into relational algebra
                                                                                                                  Phases of centralized query processing
Differences to centralized query processing                                                                        Query parsing
        Considering the physical data distribution during query optimization                                      Query transformation
                                                                                                                  Query optimization
        Considering communication costs
                                                                                                        3       Basics of distributed query processing
Assumptions
                                                                                                                  Phases of distributed query processing
        Data is distributed among multiple nodes                                                                  Introduction
        Existence of a global conceptual schema, which is used by all nodes                                       Meta data management
                                                                                                                  Data localization
        Queries are formulated on the global schema
                                                                                                        4       Global query optimization
                                                                                                                  Main questions
                                                                                                                  Global query optimizer
                                                                                                                  Distributed cost model
            Katja Hose                     Distributed Database Systems   November 10, 2011   5 / 167                Katja Hose                       Distributed Database Systems   November 10, 2011   6 / 167




Distributed Database Systems                                                                            Distributed Database Systems
  Detour on centralized query processing                                                                  Detour on centralized query processing
                                                                                                            Translating SQL into relational algebra

         Join order optimization                                                                                Translating SQL into relational algebra
         Total time models
         Response time models                                                                           SQL query structure:

                                                                                                                select distinct a1 , . . . , an
                                                                                                                from            R1 , . . . , Rn
                                                                                                                where           p

                                                                                                        Algorithm:
5    Summary                                                                                                1    Translating the from clause
                                                                                                        Let R1 , . . . , Rk be the relations in the from clause of the query
                                                                                                        Construct expression:

                                                                                                                                          R1                                  if k = 1
                                                                                                                              R=
                                                                                                                                          ((. . . (R1 × R2 ) × . . . ) × Rk ) otherwise

            Katja Hose                     Distributed Database Systems   November 10, 2011   7 / 167                Katja Hose                       Distributed Database Systems   November 10, 2011   8 / 167
Distributed Database Systems                                                                                 Distributed Database Systems
  Detour on centralized query processing                                                                       Detour on centralized query processing
    Translating SQL into relational algebra                                                                      Translating SQL into relational algebra


        Translating SQL into relational algebra                                                                      Translating SQL into relational algebra


Algorithm :                                                                                                  Algorithm :
    2    Translating the where clause                                                                            3    Translating the select clause

Let F be the predicate in the where clause of the query (if a where clause                                   Let a1 , . . . , an (or “*”) be the projection in the select clause of the query
exists)                                                                                                      Construct expression:
Construct expression:
                                                                                                                                                  W                if the projection is “*”
                                                                                                                                       S=
                                       R      if there is no where clause                                                                         πa1 ,...,an (W ) otherwise
                          W =
                                       σF (R) otherwise                                                      Output:
                                                                                                             S



             Katja Hose                       Distributed Database Systems    November 10, 2011    9 / 167                Katja Hose                       Distributed Database Systems   November 10, 2011   10 / 167




Distributed Database Systems                                                                                 Distributed Database Systems
  Detour on centralized query processing                                                                       Detour on centralized query processing
    Translating SQL into relational algebra                                                                      Phases of centralized query processing


        Translating SQL into relational algebra                                                                      Workflow for centralized query processing
Example query
        select distinct e.EN ame, s.Salary
        from            Employees e, Salary s
        where           e.T itle = s.T itle and s.Salary ≥ 60.000

                                  R1                                  if k = 1
                      R=
                                  ((. . . (R1 × R2 ) × . . . ) × Rk ) otherwise


                                      R = Employees × Salary


                                       R      if there is no where clause
                          W =
                                       σF (R) otherwise
             Katja Hose                       Distributed Database Systems   November 10, 2011    11 / 167                Katja Hose                       Distributed Database Systems   November 10, 2011   12 / 167
Distributed Database Systems                                                                             Distributed Database Systems
  Detour on centralized query processing                                                                   Detour on centralized query processing
    Query parsing                                                                                            Query parsing


      Query parsing                                                                                            Example


Transform a declarative query into an internal representation
        Query formulated using a declarative query language, e.g., SQL                                   Example
        The Parser translates the query into an internal representation                                          Database managing information about employees and projects
                Called naive query plan                                                                                  Employees(EID, EN ame, T itle)
                Plan described by an operator tree of relational algebra operators                                       Assignment(EN o, P N o, Duration)
                                                                                                                 Query: return the names of all employees working for project ’P1’
                                                                                                                         SELECT EName
                                                                                                                         FROM Employees e, Assignment a
                                                                                                                         WHERE e.EID = ENo AND PNo=’P1’




            Katja Hose                     Distributed Database Systems   November 10, 2011   13 / 167               Katja Hose                     Distributed Database Systems   November 10, 2011   14 / 167




Distributed Database Systems                                                                             Distributed Database Systems
  Detour on centralized query processing                                                                   Detour on centralized query processing
    Query parsing                                                                                            Query parsing


      Example                                                                                                  Operator tree

                                                                                                         πEN ame σP N o= P 1 ∧Employees.EID=Assignment.EN o Employees × Assignment
Query
        SELECT EName
        FROM Employees e, Assignment a
        WHERE e.EID = ENo AND PNo=’P1’
Translation into relational algebra
        πEN ame σP N o= P 1 ∧Employees.EID=Assignment.EN o Employees ×
        Assignment
In contrast to the SQL statement, the algebra statement already contains
the required basic evaluation operators
                                                                                                                                                        Operator tree


            Katja Hose                     Distributed Database Systems   November 10, 2011   15 / 167               Katja Hose                     Distributed Database Systems   November 10, 2011   16 / 167
Distributed Database Systems                                                                             Distributed Database Systems
  Detour on centralized query processing                                                                   Detour on centralized query processing
    Query transformation                                                                                     Query transformation


      Workflow for centralized query processing                                                                   Query transformation


                                                                                                         Steps
                                                                                                             1   Name resolution
                                                                                                                 Transforming object names into internal names
                                                                                                             2   Semantic analysis
                                                                                                                 Checking for global relations and attributes, view expansion, global
                                                                                                                 access control
                                                                                                             3   Normalization
                                                                                                                 Transforming predicates into a canonical format
                                                                                                             4   Simple algebraic rewriting
                                                                                                                 Application of heuristics to eliminate bad plans



            Katja Hose                     Distributed Database Systems   November 10, 2011   17 / 167               Katja Hose                     Distributed Database Systems   November 10, 2011   18 / 167




Distributed Database Systems                                                                             Distributed Database Systems
  Detour on centralized query processing                                                                   Detour on centralized query processing
    Query transformation                                                                                     Query transformation


      Semantic analysis                                                                                          Normalization


                                                                                                         Objective
        Check if the global schema defines all attributes and relations
                                                                                                                 Simplification of the following optimization by transforming the query
        referenced in the query
                                                                                                                 into a canonical format
        If the query is formulated on a view, replace references to
                                                                                                                 Selection and join predicates
        relations/attributes with references to global relations/attributes
                                                                                                                         Conjunctive normal form vs. disjunctive normal form
        Perform simple integrity checks, e.g., are the types of attributes                                               Conjunctive normal form:
        used in comparison predicates of the same type?                                                                  (p11 ∨ p12 ∨ · · · ∨ p1n ) ∧ · · · ∧ (pm1 ∨ pm2 ∨ · · · ∨ pmn )
        Initial check if the query has the rights to access referenced                                                   Disjunctive normal form:
                                                                                                                         (p11 ∧ p12 ∧ · · · ∧ p1n ) ∨ · · · ∨ (pm1 ∧ pm2 ∧ · · · ∧ pmn )
        relations/attributes
                                                                                                                 Transformation based on equivalence rules for logical operators



            Katja Hose                     Distributed Database Systems   November 10, 2011   19 / 167               Katja Hose                     Distributed Database Systems   November 10, 2011   20 / 167
Distributed Database Systems                                                                             Distributed Database Systems
  Detour on centralized query processing                                                                   Detour on centralized query processing
    Query transformation                                                                                     Query transformation


      Normalization                                                                                            Normalization
                                                                                                         Example
                                                                                                         SELECT EName
Equivalence rules                                                                                        FROM Employees e, Assignment a
        p1 ∧ p2 ⇐⇒ p2 ∧ p1 and p1 ∨ p2 ⇐⇒ p2 ∨ p1                                                        WHERE e.EID = a.ENo AND Duration ≥ 3 AND (PNo=’P1’ OR
                                                                                                         PNo=’P2’)
        p1 ∧ (p2 ∧ p3 ) ⇐⇒ (p1 ∧ p2 ) ∧ p3 and p1 ∨ (p2 ∨ p3 ) ⇐⇒ (p1 ∨ p2 ) ∨ p3
        p1 ∧ (p2 ∨ p3 ) ⇐⇒ (p1 ∧ p2) ∨ (p1 ∧ p3 ) and                                                    Selection condition in disjunctive normal form
        p1 ∨ (p2 ∧ p3 ) ⇐⇒ (p1 ∨ p2) ∧ (p1 ∨ p3 )
                                                                                                                           (EID = ENo ∧ Duration ≥ 3 ∧ PNo=’P1’) ∨
        ¬(p1 ∧ p2 ) ⇐⇒ ¬p1 ∨ ¬p2 and ¬(p1 ∨ p2 ) ⇐⇒ ¬p1 ∧ ¬p2
                                                                                                                                  (EID = ENo ∧ Duration ≥ 3 ∧ PNo=’P2’)
        ¬(¬p1 ) ⇐⇒ p1
                                                                                                         Selection condition in conjunctive normal form

                                                                                                                      EID = ENo ∧ Duration ≥ 3 ∧ (PNo=’P1’ ∨ PNo=’P2’)

            Katja Hose                     Distributed Database Systems   November 10, 2011   21 / 167               Katja Hose                     Distributed Database Systems   November 10, 2011   22 / 167




Distributed Database Systems                                                                             Distributed Database Systems
  Detour on centralized query processing                                                                   Detour on centralized query processing
    Query transformation                                                                                     Query optimization


      Simple algebraic rewriting                                                                               Workflow for centralized query processing

Simple optimizations that are always beneficial regardless of system state
        Elimination of redundant predicates
        Simplification of expressions
        Unnesting of subqueries and views
Tasks
       Recognize and simplify all
       expressions/operations/subqueries that
       are “obviously” unnecessary, redundant,
       or contradictory.
       Do not consider system state
       information, e.g., size of tables,
       existence of indexes, etc.

            Katja Hose                     Distributed Database Systems   November 10, 2011   23 / 167               Katja Hose                     Distributed Database Systems   November 10, 2011   24 / 167
Distributed Database Systems                                                                                       Distributed Database Systems
  Detour on centralized query processing                                                                             Detour on centralized query processing
    Query optimization                                                                                                 Query optimization


        Query optimization                                                                                               Heuristics


Steps                                                                                                                      Use simple heuristics which usually lead to better performance
  1 Algebraic optimization
                                                                                                                           Not the optimal plan is needed, but the really bad ones should be
                Find a good relational algebra operator tree                                                               avoided
                Heuristic query optimization
                                                                                                                           Heuristics
                Cost-based query optimization
                Statistical query optimization                                                                                     Break selections
                                                                                                                                   Complex selection criteria should be broken into multiple parts
    2    Physical optimization                                                                                                     Push projection and push selection
                Find suitable algorithms for implementing the operations                                                           Cheap selections and projections should be performed as early as
                                                                                                                                   possible to reduce the sizes of intermediate results
                                                                                                                                   Force joins
                                                                                                                                   In most cases, using a join is much cheaper than using a Cartesian
                                                                                                                                   product and a selection


            Katja Hose                       Distributed Database Systems           November 10, 2011   25 / 167               Katja Hose                     Distributed Database Systems   November 10, 2011   26 / 167




Distributed Database Systems                                                                                       Distributed Database Systems
  Detour on centralized query processing                                                                             Detour on centralized query processing
    Query optimization                                                                                                 Query optimization


        Algebraic optimization rules                                                                                     Algebraic optimization rules

Operator            is commutative:

                                            r1     r2 ⇐⇒ r2            r1
                                                                                                                   Combinations of selections σ can be combined using logical and (∧). The
Operator            is associative:                                                                                order of the selections is arbitrary:

                                (r1        r2 )    r3 ⇐⇒ r1            (r2   r3 )                                                        σF1 (σF2 (r1 )) ⇐⇒ σF1 ∧F2 (r1 ) ⇐⇒ σF2 (σF1 (r1 ))

For operator π in combination with another operator π, the “outer”                                                 Exploiting commutativity of ∧
parameter dominates the “inner” one:

                                 πX (πY (r1 )) ⇐⇒ πX (r1 ) if X ⊆ Y



            Katja Hose                       Distributed Database Systems           November 10, 2011   27 / 167               Katja Hose                     Distributed Database Systems   November 10, 2011   28 / 167
Distributed Database Systems                                                                             Distributed Database Systems
  Detour on centralized query processing                                                                   Detour on centralized query processing
    Query optimization                                                                                       Query optimization


      Algebraic optimization rules                                                                             Algebraic optimization rules
                                                                                                         Operators σ and               commute if all selection attributes are contained in the same
                                                                                                         relation:
Operators π and σ commute if predicate F is defined based on the                                                                    σF (r1           r2 ) ⇐⇒ σF (r1 )          r2 if attr(F ) ⊆ R1
projection attributes:                                                                                   A selection predicate can be split up in conjunction with a join (F = F1 ∧ F2 ) if
                                                                                                         the attributes referred to by F1 and F2 are contained in different relations:
                         σF (πX (r1 )) ⇐⇒ πX (σF (r1 )) if attr(F ) ⊆ X
                                                                                                                                           σF (r1          r2 ) ⇐⇒ σF1 (r1 )             σF2 (r2 )
Alternatively, change in ordering possible if the projection is extended by
                                                                                                                                           if attr(F1 ) ⊆ R1 and attr(F2 ) ⊆ R2
all necessary attributes:
                                                                                                         In any case, part of a selection can be split up by separating predicates F1
               πX1 (σF (r1 )) ⇐⇒ πX1 (σF (πX1 ,X2 (r1 ))) if attr(F ) ⊇ X2                               referencing attributes of R1 only, F2 contains the remaining predicates referencing
                                                                                                         attributes of both relations

                                                                                                                              σF (r1       r2 ) ⇐⇒ σF2 (σF1 (r1 )                r2 ) if attr(F1 ) ⊆ R1


            Katja Hose                     Distributed Database Systems   November 10, 2011   29 / 167               Katja Hose                           Distributed Database Systems               November 10, 2011   30 / 167




Distributed Database Systems                                                                             Distributed Database Systems
  Detour on centralized query processing                                                                   Detour on centralized query processing
    Query optimization                                                                                       Query optimization


      Algebraic optimization rules                                                                             Algebraic optimization rules


Commutativity of σ and ∪:                                                                                Commutativity of π and                       :

                                 σF (r1 ∪ r2 ) ⇐⇒ σF (r1 ) ∪ σF (r2 )                                                               πX (r1            r2 ) ⇐⇒ πX (πY1 (r1 )                 πY2 (r2 ))

Commutativity of σ and −:                                                                                with
                                                                                                                                               Y1 = (X ∩ R1 ) ∪ (R1 ∩ R2 )
                                 σF (r1 − r2 ) ⇐⇒ σF (r1 ) − σF (r2 )
                                                                                                         and
or in case F only references tuples in r1 :                                                                                                    Y2 = (X ∩ R2 ) ∪ (R1 ∩ R2 )

                                     σF (r1 − r2 ) ⇐⇒ σF (r1 ) − r2                                      Pushing a projection is possible if all Yi are defined in such a way that they
                                                                                                         preserve all attributes necessary to perform the join.



            Katja Hose                     Distributed Database Systems   November 10, 2011   31 / 167               Katja Hose                           Distributed Database Systems               November 10, 2011   32 / 167
Distributed Database Systems                                                                                    Distributed Database Systems
  Detour on centralized query processing                                                                          Detour on centralized query processing
    Query optimization                                                                                              Query optimization


      Algebraic optimization rules                                                                                    Heuristic algebraic optimization – Example


Further rules
        Commutativity of π and ∪:

                                     πX (r1 ∪ r2 ) ⇐⇒ πX (r1 ) ∪ πX (r2 )
                                                                                                                Use algebraic optimization heuristics
        Distributive law for and ∪, distributive law for and −,
        Commutativity of renaming β with other operators, . . .                                                        Force join
        Idempotence, e.g., A ∨ A ⇐⇒ A                                                                                  Push selection and projection
        Operations involving empty relations
        Commutative and associative laws for                         , ∪ und ∩



            Katja Hose                     Distributed Database Systems          November 10, 2011   33 / 167               Katja Hose                     Distributed Database Systems   November 10, 2011   34 / 167




Distributed Database Systems                                                                                    Distributed Database Systems
  Detour on centralized query processing                                                                          Detour on centralized query processing
    Query optimization                                                                                              Query optimization


      Cost-based algebraic query optimization                                                                         Physical query optimization

                                                                                                                Physical optimization
Most non-distributed RDBMS strongly rely on cost-based optimizations
                                                                                                                        Input:
        Aim for better optimized plan with respect to system and data                                                   Optimized query plan consisting of algebra operators
        characteristics                                                                                                 Choose an algorithm to compute a particular algebra operator
        Join order optimization
                                                                                                                        Join:
        Basic approach                                                                                                  Block-Nested-Loop join, hash join, merge join, . . .
                Establish a cost model for various operations
                Enumerate all query plans and compute costs                                                             Select:
                Pick the best query plan                                                                                Full table scan, index lookup, ad-hoc index generation & lookup, . . .
        Usually, dynamic programming techniques are used to keep                                                Tasks
        computational effort manageable
                                                                                                                        Translating a query plan into an execution plan
                                                                                                                Physical and algebraic optimization are often interleaved

            Katja Hose                     Distributed Database Systems          November 10, 2011   35 / 167               Katja Hose                     Distributed Database Systems   November 10, 2011   36 / 167
Distributed Database Systems                                                                             Distributed Database Systems
  Detour on centralized query processing                                                                   Basics of distributed query processing
    Query optimization


      Query optimization example                                                                         1    Motivation
                                                                                                         2    Detour on centralized query processing
                                                                                                                Translating SQL into relational algebra
                                                                                                                Phases of centralized query processing
Output: query execution plan
                                                                                                                Query parsing
                                                                                                                Query transformation
                                                                                                                Query optimization
                                                                                                         3    Basics of distributed query processing
                                                                                                                Phases of distributed query processing
                                                                                                                Introduction
                                                                                                                Meta data management
                                                                                                                Data localization
                                                                                                         4    Global query optimization
                                                                                                                Main questions
                                                                                                                Global query optimizer
                                                                                                                Distributed cost model
             Katja Hose                    Distributed Database Systems   November 10, 2011   37 / 167                Katja Hose                      Distributed Database Systems   November 10, 2011   38 / 167




Distributed Database Systems                                                                             Distributed Database Systems
  Basics of distributed query processing                                                                   Basics of distributed query processing
                                                                                                             Phases of distributed query processing

         Join order optimization                                                                               Workflow for distributed query processing
         Total time models
         Response time models




5    Summary




             Katja Hose                    Distributed Database Systems   November 10, 2011   39 / 167                Katja Hose                      Distributed Database Systems   November 10, 2011   40 / 167
Distributed Database Systems                                                                             Distributed Database Systems
  Basics of distributed query processing                                                                   Basics of distributed query processing
    Introduction                                                                                             Introduction


      Basic considerations                                                                                     Basic considerations
                                                                                                         Costs are more difficult to predict
Distributed query processing
                                                                                                                 Join selectivity: is it worthwhile to push down a selection?
        Shares the same properties of centralized query processing
                                                                                                                 Data is distributed: difficult to get meaningful statistics
        Similar problem but with different objectives and constraints
                                                                                                                 Network latency is very hard to predict
Objectives for centralized query processing                                                                      Current workload at nodes, load shedding
        Minimize the number of disk accesses                                                             Additional cost factors and constraints
        Minimize computational time                                                                              Extension of relational algebra (sending/receiving data)
Objectives for distributed query processing                                                                      Data localization (which node holds relevant data)
        Minimize resource consumption                                                                            Replication and caching (where to compute an operation)
        Minimize response time                                                                                   Network models
        Maximize throughput                                                                                      Response-time models
                                                                                                                 Data and structural heterogeneity (federated databases . . . )
             Katja Hose                    Distributed Database Systems   November 10, 2011   41 / 167                Katja Hose                    Distributed Database Systems      November 10, 2011   42 / 167




Distributed Database Systems                                                                             Distributed Database Systems
  Basics of distributed query processing                                                                   Basics of distributed query processing
    Introduction                                                                                             Introduction


      Consequences                                                                                             Example

                                                                                                         Query
Optimization is much more difficult than in the central case
                                                                                                                 Return the names of all employees working for project ’P1’
        Statistics and costs change over time, e.g., workload at a node,
        network load                                                                                             πEN ame (πEID,EN ame (Employees)                          Employees.EID=Assignment.EN o
                                                                                                                 πEN o (σP N o= P 1 (Assignment)))
        More conflicting optimization goals
        Increase throughput → reduce replication and parallelization,                                    Problems
        increase query response time → increase parallelization                                                  Relations are fragmented and distributed among five nodes
        More cost factors and constraints                                                                        The Employees relation uses primary horizontal fragmentation
Consequences                                                                                                     One fragment located at node 1, the other at node 2, no replication
        Adaptive query plans (create an initial plan and optimize it on-the-fly)                                  The Assignment relation uses derived horizontal fragmentation
                                                                                                                 One fragment located at node 3, the other at node 4, no replication
        Do not aim for the best plan, but for a good plan
                                                                                                                 The query originates from node 5

             Katja Hose                    Distributed Database Systems   November 10, 2011   43 / 167                Katja Hose                    Distributed Database Systems      November 10, 2011   44 / 167
Distributed Database Systems                                                                             Distributed Database Systems
  Basics of distributed query processing                                                                   Basics of distributed query processing
    Introduction                                                                                             Introduction


      Example                                                                                                  Example
                                                                                                         Cost model and statistics
                                                                                                                 Accessing a tuple costs 1 unit (acc)
                                                                                                                 Transferring a tuple costs 10 units (trans)
                                                                                                                 There are 400 employees and 1000 assignments
                                                                                                                 20 assignments for project ‘P1’
                                                                                                                 All tuples are uniformly distributed, i.e., nodes 3 and 4 provide 10
                                                                                                                 assignments for project ‘P1’ each
                                                                                                                 There are local indexes on attribute P N o at nodes 3 and 4 (as well as
                                                                                                                 indexes on primary keys at all nodes)
                                                                                                                 Direct tuple access is possible on local sites, no scanning
                                                                                                                 All nodes can directly communicate with each other
                                                                                                                 Simplification: no costs for unions and projections

             Katja Hose                    Distributed Database Systems   November 10, 2011   45 / 167                Katja Hose                    Distributed Database Systems   November 10, 2011   46 / 167




Distributed Database Systems                                                                             Distributed Database Systems
  Basics of distributed query processing                                                                   Basics of distributed query processing
    Introduction                                                                                             Introduction


      Example                                                                                                  Example
                                                                                                         Simple execution plan - Version B
Simple execution plan - Version A
                                                                                                         Ship intermediate results
Transfer all data to Node 5




             Katja Hose                    Distributed Database Systems   November 10, 2011   47 / 167                Katja Hose                    Distributed Database Systems   November 10, 2011   48 / 167
Distributed Database Systems                                                                             Distributed Database Systems
  Basics of distributed query processing                                                                   Basics of distributed query processing
    Introduction                                                                                             Introduction


      Example                                                                                                  Example
                                                                                                         Costs plan B: 440 units
Costs plan A: 23.000 units




             Katja Hose                    Distributed Database Systems   November 10, 2011   49 / 167                Katja Hose                    Distributed Database Systems   November 10, 2011   50 / 167




Distributed Database Systems                                                                             Distributed Database Systems
  Basics of distributed query processing                                                                   Basics of distributed query processing
    Introduction                                                                                             Introduction


      Important aspects of distributed query processing                                                        Important aspects of distributed query processing




        Meta data management
        Data localization
        Global query optimization
        Post-processing




             Katja Hose                    Distributed Database Systems   November 10, 2011   51 / 167                Katja Hose                    Distributed Database Systems   November 10, 2011   52 / 167
Distributed Database Systems                                                                             Distributed Database Systems
  Basics of distributed query processing                                                                   Basics of distributed query processing
    Meta data management                                                                                     Meta data management


      Workflow for distributed query processing                                                                 Meta data management



                                                                                                         Prerequisites to perform query optimization
                                                                                                                 Meta data must be available
                                                                                                                 Meta data is stored in the catalog
                                                                                                                 Catalog provides information about the data distribution
                                                                                                         Use this information to decide, for instance, if it is worthwhile to execute a
                                                                                                         selection very early.




             Katja Hose                    Distributed Database Systems   November 10, 2011   53 / 167                Katja Hose                    Distributed Database Systems   November 10, 2011   54 / 167




Distributed Database Systems                                                                             Distributed Database Systems
  Basics of distributed query processing                                                                   Basics of distributed query processing
    Meta data management                                                                                     Meta data management


      Meta data management                                                                                     Meta data management
Typical contents of a catalog for distributed database management systems
        Database schema                                                                                  Where to store the catalog in a distributed system?
        Definitions of tables, views, constraints, keys,. . .                                                     Central node
        Partitioning schema                                                                                      Simple solution, bottleneck
        Information about how the schema is partitioned and how tables can                                       Replicated at all nodes
        be reconstructed                                                                                         Updates are expensive
        Allocation schema
                                                                                                                 Fragmented
        Information about which fragment can be found at which node
                                                                                                                 In rare cases, the catalog may become very large
        (including information about replication)
                                                                                                                 Catalog has to be fragmented and allocated
        Network information
                                                                                                                 Caching
        Information about node connections, network model
                                                                                                                 Replicate only needed parts of a central catalog, anticipate potential
        Additional physical information
                                                                                                                 inconsistencies
        Information about indexes, data statistics (histograms, etc.),
        hardware resources (processing & storage),. . .
             Katja Hose                    Distributed Database Systems   November 10, 2011   55 / 167                Katja Hose                    Distributed Database Systems   November 10, 2011   56 / 167
Distributed Database Systems                                                                             Distributed Database Systems
  Basics of distributed query processing                                                                   Basics of distributed query processing
    Meta data management                                                                                     Meta data management


      Meta data management                                                                                     Meta data management



Centralized catalog                                                                                      Replicated catalog
        One instance of the global catalog at a central node                                                     Full copy of the global catalog at each node
        Advantages                                                                                               Advantages
                No need to update copies                                                                                 Little communication overhead for queries
                Little memory consumption                                                                                Good availability
        Disadvantages                                                                                            Disadvantages
                Communication with central node for each query                                                           High update costs
                Central node potentially represents a bottleneck




             Katja Hose                    Distributed Database Systems   November 10, 2011   57 / 167                Katja Hose                    Distributed Database Systems   November 10, 2011   58 / 167




Distributed Database Systems                                                                             Distributed Database Systems
  Basics of distributed query processing                                                                   Basics of distributed query processing
    Meta data management                                                                                     Meta data management


      Meta data management                                                                                     Meta data management



Fragmented catalog                                                                                       Caching catalog data

        Partitioning the global catalog and assigning partitions to nodes                                        Caching non-local catalog data
        Advantages                                                                                               Advantages
                Sharing load among nodes                                                                                 Avoiding remote access to frequently needed catalog data
                Reducing update overhead                                                                                 Reducing communication overhead
        Disadvantages                                                                                            Disadvantages
                Localizing necessary partitions of the global catalog                                                    Coherency control
                                                                                                                         Invalidating cached copies in the presence of updates




             Katja Hose                    Distributed Database Systems   November 10, 2011   59 / 167                Katja Hose                    Distributed Database Systems   November 10, 2011   60 / 167
Distributed Database Systems                                                                             Distributed Database Systems
  Basics of distributed query processing                                                                   Basics of distributed query processing
    Meta data management                                                                                     Data localization


      Meta data management                                                                                     Workflow for distributed query processing



Caching catalog data
        Explicit invalidation
                Owner of catalog data remembers nodes with local copies
                In case of updates: sending an invalidation message to nodes with local
                copies
        Implicit invalidation
                Identifying old catalog data during runtime (adding version numbers
                and time stamps to query messages)




             Katja Hose                    Distributed Database Systems   November 10, 2011   61 / 167                Katja Hose                    Distributed Database Systems   November 10, 2011   62 / 167




Distributed Database Systems                                                                             Distributed Database Systems
  Basics of distributed query processing                                                                   Basics of distributed query processing
    Data localization                                                                                        Data localization


      Data localization                                                                                        Example – horizontal reduction
Objective                                                                                                Schema

        Creating subqueries in consideration of the data distribution                                            Projects1 = σBudget≤150.000 (Projects)
                                                                                                                 Projects2 = σ150.000<Budget≤200.000 (Projects)
Assumptions                                                                                                      Projects3 = σBudget>200.000 (Projects)
        Fragmentation is defined by fragmentation expressions                                             Reconstruction expression (horizontal fragmentation)
        Each fragment is allocated only at one node (no replication)                                             Projects = Projects1 ∪ Projects2 ∪ Projects3
        Fragmentation expressions and locations of the fragments are stored                              Example query
        in the catalog
                                                                                                                 σLocation= Saarbr. ∧Budget≤100.000 (Projects)
Main tasks                                                                                               After replacing references to global relations
        Replace access to global relations with accesses to the fragments                                     σLocation= Saarbr. ∧Budget≤100.000 (Projects1 ∪ Projects2 ∪
        Insert reconstruction expression into algebra query                                                   Projects3 )
        Basic algebraic simplifications of the query                                                                                          Further optimization is possible!
             Katja Hose                    Distributed Database Systems   November 10, 2011   63 / 167                Katja Hose                    Distributed Database Systems   November 10, 2011   64 / 167
Distributed Database Systems                                                                             Distributed Database Systems
  Basics of distributed query processing                                                                   Basics of distributed query processing
    Data localization                                                                                        Data localization


      Query simplification – horizontal reduction                                                               Example – horizontal reduction

Objective
                                                                                                         Query with fragmentation expression
        Eliminate non-necessary subqueries                                                               σLocation= Saarbr. ∧Budget≤100.000 (Projects1 ∪ Projects2 ∪ Projects3 )

Horizontal reduction rule                                                                                Fragment definitions
                                                                                                                 Projects1 = σBudget≤150.000 (Projects)
     Given fragments of R as FR = {R1 , . . . , Rn } with Ri = σpi (R)                                           Projects2 = σ150.000<Budget≤200.000 (Projects)
        All fragments Ri for which σps (Ri ) = ∅ can be removed                                                  Projects3 = σBudget>200.000 (Projects)
        with ps denoting the query’s selection predicate
                                                                                                         Because of
        σps (Ri ) = ∅ ⇐ ∀x ∈ R : ¬(ps (x) ∧ (pi (x))                                                     σBudget≤100.000 (Projects2 ) = ∅, σBudget≤100.000 (Projects3 ) = ∅
        The selection with the query predicate ps on fragment Ri is empty if
        ps contradicts the fragmentation predicate pi of Ri , i.e., ps and pi are                        We obtain the reduced query
        never true at the same time for all tuples in Ri                                                 σLocation= Saarbr. (σBudget≤100.000 (Projects1 ))



             Katja Hose                    Distributed Database Systems   November 10, 2011   65 / 167                Katja Hose                      Distributed Database Systems   November 10, 2011   66 / 167




Distributed Database Systems                                                                             Distributed Database Systems
  Basics of distributed query processing                                                                   Basics of distributed query processing
    Data localization                                                                                        Data localization


      Query simplification – join reduction                                                                     Example – join reduction
Join Reductions                                                                                          Schema
        Larger joins are replaced by multiple partial joins on fragments                                 Projects(PNo, PName, Budget, Location)
        Distributive law: (R1 ∪ R2 ) S = (R1 S) ∪ (R2 S)                                                         Projects1 = σP N o= P 1 ∨P N o= P 2 (Projects)
                                                                                                                 Projects2 = σP N o= P 3 (Projects)
        Eliminate all union fragments that will return an empty result
                                                                                                                 Projects3 = σP N o= P 4 (Projects)
Expectations
                                                                                                         Assignment(ENo, PNo, Duration)
        Elimination of partial joins producing empty results                                                     Assignment1 = σP N o= P 1 ∨P N o= P 2 (Assignment)
        Depends on fragmentation optimality                                                                      Assignment2 = σP N o= P 3 ∨P N o= P 4 (Assignment)
        Many joins on small relations have lower resource costs than one large
                                                                                                         Example query
        join
        Depends on fragmentation and applied join algorithms                                             select * from Projects p, Assignment a where p.PNo = a.PNo
        Smaller joins can be executed in parallel                                                        In relational algebra
        Might decrease response time but might also increase communication                                                                          Projects        Assignment
        costs
             Katja Hose                    Distributed Database Systems   November 10, 2011   67 / 167                Katja Hose                      Distributed Database Systems   November 10, 2011   68 / 167
Distributed Database Systems                                                                                    Distributed Database Systems
  Basics of distributed query processing                                                                          Basics of distributed query processing
    Data localization                                                                                               Data localization


      Example – join reduction                                                                                        Query simplification – join reduction
Query
                                           Projects        Assignment                                           Join reduction rule
                                                                                                                     Given fragments of R as FR = {R1 , . . . , Rn } and fragments of S as
After replacing global relations with reconstruction expressions                                                     FS = {S1 , . . . , Sn }
 (Projects1 ∪ Projects2 ∪ Projects3 )                              (Assignment1 ∪ Assignment2 )                         Apply distributive law, e.g.:
                                                                                                                        (R1 ∪ R2 )           (S1 ∪ S2 ) = (R1        S1 ) ∪ (R1           S2 ) ∪ (R2    S1 ) ∪ (R2     S2 )
After applying the distributive law                                                                                     All partial joins between fragments Ri and Sj for which Ri                                 Sj = ∅
                                                                                                                        can be removed
      (Projects1               Assignment1 ) ∪ (Projects1                   Assignment2 ) ∪
                                                                                                                        Ri Sj = ∅ ⇐ ∀x ∈ Ri , y ∈ Sj : ¬(pi (x) ∧ pj (y))
      (Projects2               Assignment1 ) ∪ (Projects2                   Assignment2 ) ∪
                                                                                                                        The join between fragments Ri and Rj is empty if their respective
          (Projects3               Assignment1 ) ∪ (Projects3                 Assignment2 )                             fragmentation predicates (on the join attribute) contradict, i.e., there
                                                                                                                        is no tuple combination x and y such that both partitioning
                                      Further optimization is possible!                                                 predicates are fulfilled at the same time.

             Katja Hose                      Distributed Database Systems        November 10, 2011   69 / 167                Katja Hose                    Distributed Database Systems            November 10, 2011   70 / 167




Distributed Database Systems                                                                                    Distributed Database Systems
  Basics of distributed query processing                                                                          Basics of distributed query processing
    Data localization                                                                                               Data localization


      Example – join reduction                                                                                        Query simplification – join reduction for horizontal
                                                                                                                      fragmentation
Query with fragmentation expression
      (Projects1               Assignment1 ) ∪ (Projects1                   Assignment2 ) ∪                     The easiest join reduction case follows from derived horizontal
      (Projects2               Assignment1 ) ∪ (Projects2                   Assignment2 ) ∪                     fragmentation
          (Projects3               Assignment1 ) ∪ (Projects3                 Assignment2 )                             For each fragment of the first relation, there is exactly one matching
                                                                                                                        fragment of the second relation
Some of these partial joins are empty, e.g.:
                                                                                                                        Simply use the information contained in the reconstruction expression
                                    Projects1           Assignment2 = ∅                                                 instead of comparing the reconstruction predicates to each other
Because their fragmentation expressions contradict:                                                             Join reduction for arbitrary horizontal partitioning might not be beneficial
                     Projects1 = σP N o= P 1 ∨P N o= P 2 (Projects) and
                     Assignment2 = σP N o= P 3 ∨P N o= P 4 (Assignment)
Reduced query
      (Projects1               Assignment1 ) ∪ (Projects2                   Assignment2 ) ∪
                                                           (Projects3         Assignment2 )
             Katja Hose                      Distributed Database Systems        November 10, 2011   71 / 167                Katja Hose                    Distributed Database Systems            November 10, 2011   72 / 167
Distributed Database Systems                                                                                  Distributed Database Systems
  Basics of distributed query processing                                                                        Basics of distributed query processing
    Data localization                                                                                             Data localization

      Query simplification – join reduction for derived                                                              Query simplification – join reduction for derived
      horizontal fragmentation                                                                                      horizontal fragmentation

Example                                                                                                       After replacing global relations with reconstruction expressions
                            Projects(PNo, PName, Budget, Location)
                                                                                                                          (Projects1 ∪ Projects2 )                    (Assignment1 ∪ Assignment2 )
                     Projects1 = σP N o= P 1 ∨P N o= P 2 (Projects)
                     Projects2 = σP N o= P 3 ∨P N o= P 4 (Projects)                                           After applying the distributive law

                                   Assignment(ENo, PNo, Duration)                                                   (Projects1               Assignment1 ) ∪ (Projects1                  Assignment2 ) ∪
                       Assignment1 = Assignment                        Projects1                                        (Projects2               Assignment1 ) ∪ (Projects2                Assignment2 )
                       Assignment2 = Assignment                        Projects2                              Reduced query (using information about fragmentation of relation Assignment
                                                                                                              directly)
Query in relational algebra
                                           Projects        Assignment                                                     (Projects1               Assignment1 ) ∪ (Projects2               Assignment2 )



             Katja Hose                      Distributed Database Systems      November 10, 2011   73 / 167                Katja Hose                     Distributed Database Systems        November 10, 2011   74 / 167




Distributed Database Systems                                                                                  Distributed Database Systems
  Basics of distributed query processing                                                                        Basics of distributed query processing
    Data localization                                                                                             Data localization


      Query simplification – vertical reduction                                                                      Example – vertical reduction
                                                                                                              Schema
                                                                                                              Projects(PNo, PName, Budget, Location)
                                                                                                                  Projects1 = πP N o,P N ame,Location (Projects)
                                                                                                                  Projects2 = πP N o,Budget (Projects)
Vertical fragmentation rule
                                                                                                              Reconstruction expression
        Given fragments of R as FR = {R1 , . . . , Rn } with Ri = πβi (R) with
                                                                                                                      Projects = Projects1                     Projects2
        βi representing the enumeration of a subset of R’s attributes
        Avoid joining fragments containing “useless” attributes, i.e.,                                        Example query
        fragments containing only attributes that are not referenced in the                                           πP N ame (Projects)
        query and not output in the result
                                                                                                              After replacing references to global relations
                                                                                                                      πP N ame (Projects1                Projects2 )

                                                                                                              After removing unnecessary fragments
                                                                                                                      πP N ame (Projects1 )

             Katja Hose                      Distributed Database Systems      November 10, 2011   75 / 167                Katja Hose                     Distributed Database Systems        November 10, 2011   76 / 167
Distributed Database Systems                                                                                 Distributed Database Systems
  Basics of distributed query processing                                                                       Basics of distributed query processing
    Data localization                                                                                            Data localization


      Query simplification – hybrid fragmentation                                                                   Qualified relations
                                                                                                                     Supporting algebraic optimization of queries involving fragments
                                                                                                                     Annotating fragments and intermediate relations with predicates
                                                                                                                     Estimating the size of a relation
        The reconstruction expression introduces combinations of joins and                                           Extension of relational algebra
        unions
        General guidelines                                                                                   Definition: qualified relation
                Remove empty relations generated by contradicting relations on                               A qualified relation is a pair [R : qR ] where R is a relation and qR is a
                horizontal fragments                                                                         predicate.
                Remove useless relations generated by vertical fragments
                Break and distribute joins, eliminate empty fragment joins                                   Example
                                                                                                             Representing horizontal fragments as qualified relations where the
                                                                                                             qualification predicate corresponds to the fragmentation expression

                                                                                                                                               [Projects : σP N o= P 1 ∨P N o= P 2 ]
             Katja Hose                    Distributed Database Systems       November 10, 2011   77 / 167                Katja Hose                    Distributed Database Systems   November 17, 2011   78 / 167




Distributed Database Systems                                                                                 Distributed Database Systems
  Basics of distributed query processing                                                                       Basics of distributed query processing
    Data localization                                                                                            Data localization


      Qualified relations                                                                                           Qualified relations
                                                                                                             Example query
                                                                                                                                               σ100.000≤Budget≤200.000 (Projects)
Extended relational algebra                                                                                  Qualified relations
                                                                                                                        E1      =      σ100.000≤Budget≤200.000 [Projects1 : Budget ≤ 150.000]
(1)     E   :=      σF [R : qR ]                       → [E      : F ∧ qR ]
                                                                                                                                       [E1 : (100.000 ≤ Budget ≤ 200.000) ∧ (Budget ≤ 150.000)]
(2)     E   :=      πA [R : qR ]                       → [E      : qR ]
                                                                                                                                       [E1 : 100.000 ≤ Budget ≤ 150.000]
(3)     E   :=      [R : qR ] × [S : qS ]              → [E      : qR ∧ qS ]
(4)     E   :=      [R : qR ] − [S : qS ]              → [E      : qR ]                                                E2       =      σ1000≤Budget≤200.000 [Projects2 : 150.000 < Budget ≤ 200.000]
(5)     E   :=      [R : qR ] ∪ [S : qS ]              → [E      : qR ∨ qS ]                                                           [E2 : (100.000 ≤ Budget ≤ 200.000) ∧
(6)     E   :=      [R : qR ] F [S : qS ]              → [E      : qR ∧ qS ∧ F ]                                                       (150.000 < Budget ≤ 200.000)]
                                                                                                                                       [E2 : 150.000 < Budget ≤ 200.000]
                                                                                                                       E3       =      σ100.000≤Budget≤200.000 [Projects3 : Budget > 200.000]
                                                                                                                                       [E3 : (100.000 ≤ Budget ≤ 200.000) ∧ (Budget > 200.000)]
                                                                                                                                       E3 = ∅


             Katja Hose                    Distributed Database Systems       November 17, 2011   79 / 167                Katja Hose                    Distributed Database Systems   November 17, 2011   80 / 167
Distributed Database Systems                                                                 Distributed Database Systems
  Global query optimization                                                                    Global query optimization



1    Motivation                                                                                       Join order optimization
                                                                                                      Total time models
2    Detour on centralized query processing                                                           Response time models
       Translating SQL into relational algebra
       Phases of centralized query processing
       Query parsing
       Query transformation
       Query optimization
3    Basics of distributed query processing
       Phases of distributed query processing
       Introduction
                                                                                             5    Summary
       Meta data management
       Data localization
4    Global query optimization
       Main questions
       Global query optimizer
       Distributed cost model
            Katja Hose         Distributed Database Systems   November 17, 2011   81 / 167               Katja Hose         Distributed Database Systems   November 17, 2011   82 / 167




Distributed Database Systems                                                                 Distributed Database Systems
  Global query optimization                                                                    Global query optimization
    Main questions                                                                               Main questions


      Workflow for distributed query processing                                                     Introduction to global query optimization




                                                                                             Main questions
                                                                                                     When to optimize?
                                                                                                     What criteria to optimize?
                                                                                                     Where to execute the query?




            Katja Hose         Distributed Database Systems   November 17, 2011   83 / 167               Katja Hose         Distributed Database Systems   November 17, 2011   84 / 167
Distributed Database Systems                                                                     Distributed Database Systems
  Global query optimization                                                                        Global query optimization
    Main questions                                                                                   Main questions


      When to optimize?                                                                                When to optimize?

Full compile time optimization                                                                   Fully dynamic optimization
        The full query execution plan is computed at compile time                                        Each query is optimized individually at runtime
        Assumption
                                                                                                         This technique heavily relies on heuristics, learning algorithms, and
               Applications use canned queries
                                                                                                         luck
               Prepared and parameterized SQL statements
                                                                                                         Pros
        Pros
                                                                                                                Might produce very good plans
               Queries can be executed directly
                                                                                                                Uses current network state
        Cons                                                                                                    Also usable for ad-hoc queries
               Complex to model                                                                          Cons
               Much information unknown or too expensive to gather
                                                                                                                Result quality might be very unpredictable
               Collecting statistics on all nodes?
                                                                                                                Complex algorithms and heuristics
               Statistics outdated
                                                                                                                Difficult to keep statistics up-to-date
               Especially machine load and network properties are very volatile

            Katja Hose             Distributed Database Systems   November 17, 2011   85 / 167               Katja Hose             Distributed Database Systems   November 17, 2011   86 / 167




Distributed Database Systems                                                                     Distributed Database Systems
  Global query optimization                                                                        Global query optimization
    Main questions                                                                                   Main questions


      When to optimize?                                                                                When to optimize?


Semi-dynamic optimization                                                                        Hierarchical optimization
        Pre-optimize the query                                                                           Plans are created in multiple stages
        During query execution, test if execution runs as expected during                                Global-Local-Plans
        optimization                                                                                            Global query optimizer creates a global query plan
        e.g., are tuples/fragments delivered in time?, does the network adhere                                  Focus on data transfer: which intermediate results are to be computed
                                                                                                                by which node? How should intermediate results be shipped?
        to the predicted properties?, are there any bad network latencies?, etc.
                                                                                                                Local query optimizers create local query plans
        If execution shows severe deviations, compute a new query plan for all                                  Decide on query plan layout, algorithms, indexes, etc. to deliver the
        parts that have not yet been executed                                                                   requested intermediate result
Makes only sense for queries that run for a longer time                                                  Two-Step-Plans



            Katja Hose             Distributed Database Systems   November 17, 2011   87 / 167               Katja Hose             Distributed Database Systems   November 17, 2011   88 / 167
Distributed Database Systems                                                                          Distributed Database Systems
  Global query optimization                                                                             Global query optimization
    Main questions                                                                                        Main questions


      When to optimize?                                                                                     What criteria to optimize?
Hierarchical optimization                                                                             Important aspects for global optimization
        Plans are created in multiple stages
                                                                                                              Communication operators
        Global-Local-Plans
        Two-Step-Plans                                                                                        Fragment cardinalities
               During compile time, only stable parts of the plan are computed                                Order of operations
               Join order, join methods, access paths, etc.                                                   Join ordering
               During query execution, all missing plan elements are added                                    Because permutations of the joins within the query may lead to
               Node selection, transfer policies, etc.
               Both steps can be performed using traditional query optimization                               improvements of orders of magnitude
               techniques                                                                             Most important alternative optimization criteria
                         Plan enumeration with dynamic programming
                         Complexity is manageable as each optimization problem is much easier                 Query response time
                         than a full optimization                                                             Resource consumption
                         During runtime optimization, fresh statistics are available
                                                                                                              Total query execution costs
Most distributed database management systems use semi-dynamic or
hierarchical optimization techniques (or both)                                                                ...
            Katja Hose                  Distributed Database Systems   November 17, 2011   89 / 167               Katja Hose           Distributed Database Systems   November 17, 2011   90 / 167




Distributed Database Systems                                                                          Distributed Database Systems
  Global query optimization                                                                             Global query optimization
    Main questions                                                                                        Main questions


      Where to execute the query?                                                                           Global query optimization


                                                                                                      Global query optimization. . .
        Query optimizer has to decide which parts of the query have to be                             . . . deals with finding the “best” ordering of operations in the query
        shipped to which node (cost model)                                                            (extended by fragmentation expressions and including communication
                                                                                                      operations) that minimizes a cost function.
        In heavily replicated scenarios, clever hybrid shipping can effectively
        be used for load balancing                                                                            Input
        Move expensive computations to lightly loaded nodes, avoid                                            an algebraic query extended by fragmentation expressions
        expensive communication                                                                               Output
                                                                                                              an algebraic query or query execution plan with communication
                                                                                                              operations




            Katja Hose                  Distributed Database Systems   November 17, 2011   91 / 167               Katja Hose           Distributed Database Systems   November 17, 2011   92 / 167
Distributed_Database_System
Distributed_Database_System
Distributed_Database_System
Distributed_Database_System
Distributed_Database_System
Distributed_Database_System
Distributed_Database_System
Distributed_Database_System
Distributed_Database_System
Distributed_Database_System
Distributed_Database_System
Distributed_Database_System
Distributed_Database_System
Distributed_Database_System
Distributed_Database_System
Distributed_Database_System
Distributed_Database_System
Distributed_Database_System
Distributed_Database_System

More Related Content

What's hot

Query processing and Query Optimization
Query processing and Query OptimizationQuery processing and Query Optimization
Query processing and Query OptimizationNiraj Gandha
 
MSc CST (5yr Integrated Course ) Syllabus - Madras University
MSc CST (5yr Integrated Course ) Syllabus - Madras UniversityMSc CST (5yr Integrated Course ) Syllabus - Madras University
MSc CST (5yr Integrated Course ) Syllabus - Madras UniversityGriffinder VinHai
 
Implementation of query optimization for reducing run time
Implementation of query optimization for reducing run timeImplementation of query optimization for reducing run time
Implementation of query optimization for reducing run timeAlexander Decker
 
Heuristic approch monika sanghani
Heuristic approch  monika sanghaniHeuristic approch  monika sanghani
Heuristic approch monika sanghaniMonika Sanghani
 
Database , 8 Query Optimization
Database , 8 Query OptimizationDatabase , 8 Query Optimization
Database , 8 Query OptimizationAli Usman
 
When &amp; Why\'s of Denormalization
When &amp; Why\'s of DenormalizationWhen &amp; Why\'s of Denormalization
When &amp; Why\'s of DenormalizationAliya Saldanha
 
Query Distributed RDF Graphs: The Effects of Partitioning Paper
Query Distributed RDF Graphs: The Effects of Partitioning PaperQuery Distributed RDF Graphs: The Effects of Partitioning Paper
Query Distributed RDF Graphs: The Effects of Partitioning PaperDBOnto
 
Chapter 3 principles of parallel algorithm design
Chapter 3   principles of parallel algorithm designChapter 3   principles of parallel algorithm design
Chapter 3 principles of parallel algorithm designDenisAkbar1
 
Dwh lecture-07-denormalization
Dwh lecture-07-denormalizationDwh lecture-07-denormalization
Dwh lecture-07-denormalizationSulman Ahmed
 
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화NAVER Engineering
 
Databases: Denormalisation
Databases: DenormalisationDatabases: Denormalisation
Databases: DenormalisationDamian T. Gordon
 
Cost estimation for Query Optimization
Cost estimation for Query OptimizationCost estimation for Query Optimization
Cost estimation for Query OptimizationRavinder Kamboj
 
8 query processing and optimization
8 query processing and optimization8 query processing and optimization
8 query processing and optimizationKumar
 

What's hot (20)

Query processing and Query Optimization
Query processing and Query OptimizationQuery processing and Query Optimization
Query processing and Query Optimization
 
Chapter15
Chapter15Chapter15
Chapter15
 
MSc CST (5yr Integrated Course ) Syllabus - Madras University
MSc CST (5yr Integrated Course ) Syllabus - Madras UniversityMSc CST (5yr Integrated Course ) Syllabus - Madras University
MSc CST (5yr Integrated Course ) Syllabus - Madras University
 
Query compiler
Query compilerQuery compiler
Query compiler
 
DDBMS
DDBMSDDBMS
DDBMS
 
Lecture 2
Lecture 2 Lecture 2
Lecture 2
 
Implementation of query optimization for reducing run time
Implementation of query optimization for reducing run timeImplementation of query optimization for reducing run time
Implementation of query optimization for reducing run time
 
Chapter16
Chapter16Chapter16
Chapter16
 
Heuristic approch monika sanghani
Heuristic approch  monika sanghaniHeuristic approch  monika sanghani
Heuristic approch monika sanghani
 
Database , 8 Query Optimization
Database , 8 Query OptimizationDatabase , 8 Query Optimization
Database , 8 Query Optimization
 
When &amp; Why\'s of Denormalization
When &amp; Why\'s of DenormalizationWhen &amp; Why\'s of Denormalization
When &amp; Why\'s of Denormalization
 
Query Distributed RDF Graphs: The Effects of Partitioning Paper
Query Distributed RDF Graphs: The Effects of Partitioning PaperQuery Distributed RDF Graphs: The Effects of Partitioning Paper
Query Distributed RDF Graphs: The Effects of Partitioning Paper
 
Distributed DBMS - Unit 9 - Distributed Deadlock & Recovery
Distributed DBMS - Unit 9 - Distributed Deadlock & RecoveryDistributed DBMS - Unit 9 - Distributed Deadlock & Recovery
Distributed DBMS - Unit 9 - Distributed Deadlock & Recovery
 
Denormalization
DenormalizationDenormalization
Denormalization
 
Chapter 3 principles of parallel algorithm design
Chapter 3   principles of parallel algorithm designChapter 3   principles of parallel algorithm design
Chapter 3 principles of parallel algorithm design
 
Dwh lecture-07-denormalization
Dwh lecture-07-denormalizationDwh lecture-07-denormalization
Dwh lecture-07-denormalization
 
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
 
Databases: Denormalisation
Databases: DenormalisationDatabases: Denormalisation
Databases: Denormalisation
 
Cost estimation for Query Optimization
Cost estimation for Query OptimizationCost estimation for Query Optimization
Cost estimation for Query Optimization
 
8 query processing and optimization
8 query processing and optimization8 query processing and optimization
8 query processing and optimization
 

Viewers also liked

Sub join a query optimization algorithm for flash-based database
Sub join a query optimization algorithm for flash-based databaseSub join a query optimization algorithm for flash-based database
Sub join a query optimization algorithm for flash-based databaseZhichao Liang
 
Database Introduction - Join Query
Database Introduction - Join QueryDatabase Introduction - Join Query
Database Introduction - Join QueryDudy Ali
 
Joins in databases
Joins in databases Joins in databases
Joins in databases CourseHunt
 
Database - Normalization
Database - NormalizationDatabase - Normalization
Database - NormalizationMudasir Qazi
 
Everything about Database JOINS and Relationships
Everything about Database JOINS and RelationshipsEverything about Database JOINS and Relationships
Everything about Database JOINS and RelationshipsAbdul Rahman Sherzad
 
Database Normalization
Database NormalizationDatabase Normalization
Database NormalizationEhsan Hamzei
 
NATS - A new nervous system for distributed cloud platforms
NATS - A new nervous system for distributed cloud platformsNATS - A new nervous system for distributed cloud platforms
NATS - A new nervous system for distributed cloud platformsDerek Collison
 
Normalization of database tables
Normalization of database tablesNormalization of database tables
Normalization of database tablesDhani Ahmad
 
Types Of Join In Sql Server - Join With Example In Sql Server
Types Of Join In Sql Server - Join With Example In Sql ServerTypes Of Join In Sql Server - Join With Example In Sql Server
Types Of Join In Sql Server - Join With Example In Sql Serverprogrammings guru
 
Design principles of scalable, distributed systems
Design principles of scalable, distributed systemsDesign principles of scalable, distributed systems
Design principles of scalable, distributed systemsTinniam V Ganesh (TV)
 
Distributed Systems Architecture in Software Engineering SE11
Distributed Systems Architecture in Software Engineering SE11Distributed Systems Architecture in Software Engineering SE11
Distributed Systems Architecture in Software Engineering SE11koolkampus
 
From Mainframe to Microservice: An Introduction to Distributed Systems
From Mainframe to Microservice: An Introduction to Distributed SystemsFrom Mainframe to Microservice: An Introduction to Distributed Systems
From Mainframe to Microservice: An Introduction to Distributed SystemsTyler Treat
 
Database Normalization
Database NormalizationDatabase Normalization
Database NormalizationRathan Raj
 
A Join Operator for Property Graphs
A Join Operator for Property GraphsA Join Operator for Property Graphs
A Join Operator for Property GraphsGiacomo Bergami
 
Distributed & parallel system
Distributed & parallel systemDistributed & parallel system
Distributed & parallel systemManish Singh
 

Viewers also liked (20)

Sub join a query optimization algorithm for flash-based database
Sub join a query optimization algorithm for flash-based databaseSub join a query optimization algorithm for flash-based database
Sub join a query optimization algorithm for flash-based database
 
Database Join
Database JoinDatabase Join
Database Join
 
Database Introduction - Join Query
Database Introduction - Join QueryDatabase Introduction - Join Query
Database Introduction - Join Query
 
Scrum Model
Scrum ModelScrum Model
Scrum Model
 
SQL Join Basic
SQL Join BasicSQL Join Basic
SQL Join Basic
 
Joins in databases
Joins in databases Joins in databases
Joins in databases
 
Database - Normalization
Database - NormalizationDatabase - Normalization
Database - Normalization
 
Everything about Database JOINS and Relationships
Everything about Database JOINS and RelationshipsEverything about Database JOINS and Relationships
Everything about Database JOINS and Relationships
 
SQL
SQLSQL
SQL
 
Database Normalization
Database NormalizationDatabase Normalization
Database Normalization
 
Normalization in Database
Normalization in DatabaseNormalization in Database
Normalization in Database
 
NATS - A new nervous system for distributed cloud platforms
NATS - A new nervous system for distributed cloud platformsNATS - A new nervous system for distributed cloud platforms
NATS - A new nervous system for distributed cloud platforms
 
Normalization of database tables
Normalization of database tablesNormalization of database tables
Normalization of database tables
 
Types Of Join In Sql Server - Join With Example In Sql Server
Types Of Join In Sql Server - Join With Example In Sql ServerTypes Of Join In Sql Server - Join With Example In Sql Server
Types Of Join In Sql Server - Join With Example In Sql Server
 
Design principles of scalable, distributed systems
Design principles of scalable, distributed systemsDesign principles of scalable, distributed systems
Design principles of scalable, distributed systems
 
Distributed Systems Architecture in Software Engineering SE11
Distributed Systems Architecture in Software Engineering SE11Distributed Systems Architecture in Software Engineering SE11
Distributed Systems Architecture in Software Engineering SE11
 
From Mainframe to Microservice: An Introduction to Distributed Systems
From Mainframe to Microservice: An Introduction to Distributed SystemsFrom Mainframe to Microservice: An Introduction to Distributed Systems
From Mainframe to Microservice: An Introduction to Distributed Systems
 
Database Normalization
Database NormalizationDatabase Normalization
Database Normalization
 
A Join Operator for Property Graphs
A Join Operator for Property GraphsA Join Operator for Property Graphs
A Join Operator for Property Graphs
 
Distributed & parallel system
Distributed & parallel systemDistributed & parallel system
Distributed & parallel system
 

Similar to Distributed_Database_System

Data mining - GDi Techno Solutions
Data mining - GDi Techno SolutionsData mining - GDi Techno Solutions
Data mining - GDi Techno SolutionsGDi Techno Solutions
 
Improving Findability: The Role of Information Architecture in Effective Search
Improving Findability: The Role of Information Architecture in Effective SearchImproving Findability: The Role of Information Architecture in Effective Search
Improving Findability: The Role of Information Architecture in Effective SearchScott Abel
 
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and SolrLarge Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and SolrGrant Ingersoll
 
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and SolrLarge Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and SolrGrant Ingersoll
 
Linked Open data: CNR
Linked Open data: CNRLinked Open data: CNR
Linked Open data: CNRDatiGovIT
 
IT Discovery: Automated Global Assessment
IT Discovery: Automated Global AssessmentIT Discovery: Automated Global Assessment
IT Discovery: Automated Global AssessmentHaim Ben Zagmi
 
Provenance: From e-Science to the Web Of Data
Provenance: From e-Science to the Web Of DataProvenance: From e-Science to the Web Of Data
Provenance: From e-Science to the Web Of DataJose Manuel Gómez-Pérez
 
Hadoop summit EU - Crowd Sourcing Reflected Intelligence
Hadoop summit EU - Crowd Sourcing Reflected IntelligenceHadoop summit EU - Crowd Sourcing Reflected Intelligence
Hadoop summit EU - Crowd Sourcing Reflected IntelligenceTed Dunning
 
"A Study of I/O and Virtualization Performance with a Search Engine based on ...
"A Study of I/O and Virtualization Performance with a Search Engine based on ..."A Study of I/O and Virtualization Performance with a Search Engine based on ...
"A Study of I/O and Virtualization Performance with a Search Engine based on ...Lucidworks (Archived)
 
Challenges Distributed Information Retrieval [RBY] (ICDE 2007 Turkey)
Challenges Distributed Information Retrieval [RBY] (ICDE 2007 Turkey)Challenges Distributed Information Retrieval [RBY] (ICDE 2007 Turkey)
Challenges Distributed Information Retrieval [RBY] (ICDE 2007 Turkey)Carlos Castillo (ChaTo)
 
Book Recommendation System using Data Mining for the University of Hong Kong ...
Book Recommendation System using Data Mining for the University of Hong Kong ...Book Recommendation System using Data Mining for the University of Hong Kong ...
Book Recommendation System using Data Mining for the University of Hong Kong ...CITE
 
UAB 2011- Combining human and computational intelligence
UAB 2011- Combining human and computational intelligenceUAB 2011- Combining human and computational intelligence
UAB 2011- Combining human and computational intelligenceINSEMTIVES project
 
Soeren okfn greece meetup
Soeren okfn greece meetupSoeren okfn greece meetup
Soeren okfn greece meetupOKFN-GR
 
Mesh Labs Introduction June 2012
Mesh Labs Introduction June 2012Mesh Labs Introduction June 2012
Mesh Labs Introduction June 2012Umesh Ramalingachar
 
DataONE_cobb_hubbub2012_20120924_v05
DataONE_cobb_hubbub2012_20120924_v05DataONE_cobb_hubbub2012_20120924_v05
DataONE_cobb_hubbub2012_20120924_v05John Cobb
 
Large-Scale Search Discovery Analytics with Hadoop, Mahout, Solr
Large-Scale Search Discovery Analytics with Hadoop, Mahout, SolrLarge-Scale Search Discovery Analytics with Hadoop, Mahout, Solr
Large-Scale Search Discovery Analytics with Hadoop, Mahout, SolrDataWorks Summit
 

Similar to Distributed_Database_System (20)

Search Computing Overview
Search Computing OverviewSearch Computing Overview
Search Computing Overview
 
Data mining - GDi Techno Solutions
Data mining - GDi Techno SolutionsData mining - GDi Techno Solutions
Data mining - GDi Techno Solutions
 
Improving Findability: The Role of Information Architecture in Effective Search
Improving Findability: The Role of Information Architecture in Effective SearchImproving Findability: The Role of Information Architecture in Effective Search
Improving Findability: The Role of Information Architecture in Effective Search
 
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and SolrLarge Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
 
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and SolrLarge Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
 
Linked Open data: CNR
Linked Open data: CNRLinked Open data: CNR
Linked Open data: CNR
 
IT Discovery: Automated Global Assessment
IT Discovery: Automated Global AssessmentIT Discovery: Automated Global Assessment
IT Discovery: Automated Global Assessment
 
Provenance: From e-Science to the Web Of Data
Provenance: From e-Science to the Web Of DataProvenance: From e-Science to the Web Of Data
Provenance: From e-Science to the Web Of Data
 
Hadoop summit EU - Crowd Sourcing Reflected Intelligence
Hadoop summit EU - Crowd Sourcing Reflected IntelligenceHadoop summit EU - Crowd Sourcing Reflected Intelligence
Hadoop summit EU - Crowd Sourcing Reflected Intelligence
 
"A Study of I/O and Virtualization Performance with a Search Engine based on ...
"A Study of I/O and Virtualization Performance with a Search Engine based on ..."A Study of I/O and Virtualization Performance with a Search Engine based on ...
"A Study of I/O and Virtualization Performance with a Search Engine based on ...
 
Challenges Distributed Information Retrieval [RBY] (ICDE 2007 Turkey)
Challenges Distributed Information Retrieval [RBY] (ICDE 2007 Turkey)Challenges Distributed Information Retrieval [RBY] (ICDE 2007 Turkey)
Challenges Distributed Information Retrieval [RBY] (ICDE 2007 Turkey)
 
iRODS
iRODSiRODS
iRODS
 
Book Recommendation System using Data Mining for the University of Hong Kong ...
Book Recommendation System using Data Mining for the University of Hong Kong ...Book Recommendation System using Data Mining for the University of Hong Kong ...
Book Recommendation System using Data Mining for the University of Hong Kong ...
 
UAB 2011- Combining human and computational intelligence
UAB 2011- Combining human and computational intelligenceUAB 2011- Combining human and computational intelligence
UAB 2011- Combining human and computational intelligence
 
Soeren okfn greece meetup
Soeren okfn greece meetupSoeren okfn greece meetup
Soeren okfn greece meetup
 
Mesh Labs Introduction June 2012
Mesh Labs Introduction June 2012Mesh Labs Introduction June 2012
Mesh Labs Introduction June 2012
 
DataONE_cobb_hubbub2012_20120924_v05
DataONE_cobb_hubbub2012_20120924_v05DataONE_cobb_hubbub2012_20120924_v05
DataONE_cobb_hubbub2012_20120924_v05
 
Large-Scale Search Discovery Analytics with Hadoop, Mahout, Solr
Large-Scale Search Discovery Analytics with Hadoop, Mahout, SolrLarge-Scale Search Discovery Analytics with Hadoop, Mahout, Solr
Large-Scale Search Discovery Analytics with Hadoop, Mahout, Solr
 
NISO Webinar: The Three S's of Electronic Resource Management: Systems, Stand...
NISO Webinar: The Three S's of Electronic Resource Management: Systems, Stand...NISO Webinar: The Three S's of Electronic Resource Management: Systems, Stand...
NISO Webinar: The Three S's of Electronic Resource Management: Systems, Stand...
 
Ibm i2
Ibm i2Ibm i2
Ibm i2
 

More from Philip Zhong

Cisco Webex Distributed Framework and Data Store Design
Cisco Webex Distributed Framework and Data Store DesignCisco Webex Distributed Framework and Data Store Design
Cisco Webex Distributed Framework and Data Store DesignPhilip Zhong
 
How to Implement Distributed Data Store
How to Implement Distributed Data Store How to Implement Distributed Data Store
How to Implement Distributed Data Store Philip Zhong
 
MongoDB Knowledge Shareing
MongoDB Knowledge ShareingMongoDB Knowledge Shareing
MongoDB Knowledge ShareingPhilip Zhong
 
Adapter Poxy Pattern
Adapter Poxy PatternAdapter Poxy Pattern
Adapter Poxy PatternPhilip Zhong
 
How to estimate_oracle_cost
How to estimate_oracle_costHow to estimate_oracle_cost
How to estimate_oracle_costPhilip Zhong
 
Mongo db program_installation_guide
Mongo db program_installation_guideMongo db program_installation_guide
Mongo db program_installation_guidePhilip Zhong
 
Mongo db sharding_cluster_installation_guide
Mongo db sharding_cluster_installation_guideMongo db sharding_cluster_installation_guide
Mongo db sharding_cluster_installation_guidePhilip Zhong
 
Vitess percona 2012
Vitess percona 2012Vitess percona 2012
Vitess percona 2012Philip Zhong
 
Mysql performance tuning
Mysql performance tuningMysql performance tuning
Mysql performance tuningPhilip Zhong
 
Mysql5.1 character set testing
Mysql5.1 character set testingMysql5.1 character set testing
Mysql5.1 character set testingPhilip Zhong
 
How to write_language_compiler
How to write_language_compilerHow to write_language_compiler
How to write_language_compilerPhilip Zhong
 
Compare mysql5.1.50 mysql5.5.8
Compare mysql5.1.50 mysql5.5.8Compare mysql5.1.50 mysql5.5.8
Compare mysql5.1.50 mysql5.5.8Philip Zhong
 
Mysql handle socket
Mysql handle socketMysql handle socket
Mysql handle socketPhilip Zhong
 
Mysql architecture&parameters
Mysql architecture&parametersMysql architecture&parameters
Mysql architecture&parametersPhilip Zhong
 

More from Philip Zhong (14)

Cisco Webex Distributed Framework and Data Store Design
Cisco Webex Distributed Framework and Data Store DesignCisco Webex Distributed Framework and Data Store Design
Cisco Webex Distributed Framework and Data Store Design
 
How to Implement Distributed Data Store
How to Implement Distributed Data Store How to Implement Distributed Data Store
How to Implement Distributed Data Store
 
MongoDB Knowledge Shareing
MongoDB Knowledge ShareingMongoDB Knowledge Shareing
MongoDB Knowledge Shareing
 
Adapter Poxy Pattern
Adapter Poxy PatternAdapter Poxy Pattern
Adapter Poxy Pattern
 
How to estimate_oracle_cost
How to estimate_oracle_costHow to estimate_oracle_cost
How to estimate_oracle_cost
 
Mongo db program_installation_guide
Mongo db program_installation_guideMongo db program_installation_guide
Mongo db program_installation_guide
 
Mongo db sharding_cluster_installation_guide
Mongo db sharding_cluster_installation_guideMongo db sharding_cluster_installation_guide
Mongo db sharding_cluster_installation_guide
 
Vitess percona 2012
Vitess percona 2012Vitess percona 2012
Vitess percona 2012
 
Mysql performance tuning
Mysql performance tuningMysql performance tuning
Mysql performance tuning
 
Mysql5.1 character set testing
Mysql5.1 character set testingMysql5.1 character set testing
Mysql5.1 character set testing
 
How to write_language_compiler
How to write_language_compilerHow to write_language_compiler
How to write_language_compiler
 
Compare mysql5.1.50 mysql5.5.8
Compare mysql5.1.50 mysql5.5.8Compare mysql5.1.50 mysql5.5.8
Compare mysql5.1.50 mysql5.5.8
 
Mysql handle socket
Mysql handle socketMysql handle socket
Mysql handle socket
 
Mysql architecture&parameters
Mysql architecture&parametersMysql architecture&parameters
Mysql architecture&parameters
 

Recently uploaded

How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 

Recently uploaded (20)

How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 

Distributed_Database_System

  • 1. Distributed Database Systems Distributed Database Systems Contents I 1 Motivation Distributed Database Systems 2 Detour on centralized query processing Translating SQL into relational algebra Distributed Query Processing Phases of centralized query processing Query parsing Katja Hose, Ralf Schenkel Query transformation Query optimization Max-Planck-Institut f¨r Informatik, Cluster of Excellence MMCI u 3 Basics of distributed query processing Phases of distributed query processing November 10, 2011 Introduction November 17, 2011 Meta data management Data localization 4 Global query optimization Main questions Katja Hose Distributed Database Systems November 10, 2011 1 / 167 Katja Hose Distributed Database Systems November 10, 2011 2 / 167 Distributed Database Systems Distributed Database Systems Motivation Contents II Motivation Global query optimizer Distributed cost model The task of query processing is . . . Join order optimization . . . to answer user queries Total time models Response time models Example How many students are at Saarland University? Answer: 18.000 Additional constraints 5 Summary Low response times High query throughput Efficient hardware usage ... Katja Hose Distributed Database Systems November 10, 2011 3 / 167 Katja Hose Distributed Database Systems November 10, 2011 4 / 167
  • 2. Distributed Database Systems Distributed Database Systems Motivation Detour on centralized query processing Motivation 1 Motivation 2 Detour on centralized query processing Translating SQL into relational algebra Phases of centralized query processing Differences to centralized query processing Query parsing Considering the physical data distribution during query optimization Query transformation Query optimization Considering communication costs 3 Basics of distributed query processing Assumptions Phases of distributed query processing Data is distributed among multiple nodes Introduction Existence of a global conceptual schema, which is used by all nodes Meta data management Data localization Queries are formulated on the global schema 4 Global query optimization Main questions Global query optimizer Distributed cost model Katja Hose Distributed Database Systems November 10, 2011 5 / 167 Katja Hose Distributed Database Systems November 10, 2011 6 / 167 Distributed Database Systems Distributed Database Systems Detour on centralized query processing Detour on centralized query processing Translating SQL into relational algebra Join order optimization Translating SQL into relational algebra Total time models Response time models SQL query structure: select distinct a1 , . . . , an from R1 , . . . , Rn where p Algorithm: 5 Summary 1 Translating the from clause Let R1 , . . . , Rk be the relations in the from clause of the query Construct expression: R1 if k = 1 R= ((. . . (R1 × R2 ) × . . . ) × Rk ) otherwise Katja Hose Distributed Database Systems November 10, 2011 7 / 167 Katja Hose Distributed Database Systems November 10, 2011 8 / 167
  • 3. Distributed Database Systems Distributed Database Systems Detour on centralized query processing Detour on centralized query processing Translating SQL into relational algebra Translating SQL into relational algebra Translating SQL into relational algebra Translating SQL into relational algebra Algorithm : Algorithm : 2 Translating the where clause 3 Translating the select clause Let F be the predicate in the where clause of the query (if a where clause Let a1 , . . . , an (or “*”) be the projection in the select clause of the query exists) Construct expression: Construct expression: W if the projection is “*” S= R if there is no where clause πa1 ,...,an (W ) otherwise W = σF (R) otherwise Output: S Katja Hose Distributed Database Systems November 10, 2011 9 / 167 Katja Hose Distributed Database Systems November 10, 2011 10 / 167 Distributed Database Systems Distributed Database Systems Detour on centralized query processing Detour on centralized query processing Translating SQL into relational algebra Phases of centralized query processing Translating SQL into relational algebra Workflow for centralized query processing Example query select distinct e.EN ame, s.Salary from Employees e, Salary s where e.T itle = s.T itle and s.Salary ≥ 60.000 R1 if k = 1 R= ((. . . (R1 × R2 ) × . . . ) × Rk ) otherwise R = Employees × Salary R if there is no where clause W = σF (R) otherwise Katja Hose Distributed Database Systems November 10, 2011 11 / 167 Katja Hose Distributed Database Systems November 10, 2011 12 / 167
  • 4. Distributed Database Systems Distributed Database Systems Detour on centralized query processing Detour on centralized query processing Query parsing Query parsing Query parsing Example Transform a declarative query into an internal representation Query formulated using a declarative query language, e.g., SQL Example The Parser translates the query into an internal representation Database managing information about employees and projects Called naive query plan Employees(EID, EN ame, T itle) Plan described by an operator tree of relational algebra operators Assignment(EN o, P N o, Duration) Query: return the names of all employees working for project ’P1’ SELECT EName FROM Employees e, Assignment a WHERE e.EID = ENo AND PNo=’P1’ Katja Hose Distributed Database Systems November 10, 2011 13 / 167 Katja Hose Distributed Database Systems November 10, 2011 14 / 167 Distributed Database Systems Distributed Database Systems Detour on centralized query processing Detour on centralized query processing Query parsing Query parsing Example Operator tree πEN ame σP N o= P 1 ∧Employees.EID=Assignment.EN o Employees × Assignment Query SELECT EName FROM Employees e, Assignment a WHERE e.EID = ENo AND PNo=’P1’ Translation into relational algebra πEN ame σP N o= P 1 ∧Employees.EID=Assignment.EN o Employees × Assignment In contrast to the SQL statement, the algebra statement already contains the required basic evaluation operators Operator tree Katja Hose Distributed Database Systems November 10, 2011 15 / 167 Katja Hose Distributed Database Systems November 10, 2011 16 / 167
  • 5. Distributed Database Systems Distributed Database Systems Detour on centralized query processing Detour on centralized query processing Query transformation Query transformation Workflow for centralized query processing Query transformation Steps 1 Name resolution Transforming object names into internal names 2 Semantic analysis Checking for global relations and attributes, view expansion, global access control 3 Normalization Transforming predicates into a canonical format 4 Simple algebraic rewriting Application of heuristics to eliminate bad plans Katja Hose Distributed Database Systems November 10, 2011 17 / 167 Katja Hose Distributed Database Systems November 10, 2011 18 / 167 Distributed Database Systems Distributed Database Systems Detour on centralized query processing Detour on centralized query processing Query transformation Query transformation Semantic analysis Normalization Objective Check if the global schema defines all attributes and relations Simplification of the following optimization by transforming the query referenced in the query into a canonical format If the query is formulated on a view, replace references to Selection and join predicates relations/attributes with references to global relations/attributes Conjunctive normal form vs. disjunctive normal form Perform simple integrity checks, e.g., are the types of attributes Conjunctive normal form: used in comparison predicates of the same type? (p11 ∨ p12 ∨ · · · ∨ p1n ) ∧ · · · ∧ (pm1 ∨ pm2 ∨ · · · ∨ pmn ) Initial check if the query has the rights to access referenced Disjunctive normal form: (p11 ∧ p12 ∧ · · · ∧ p1n ) ∨ · · · ∨ (pm1 ∧ pm2 ∧ · · · ∧ pmn ) relations/attributes Transformation based on equivalence rules for logical operators Katja Hose Distributed Database Systems November 10, 2011 19 / 167 Katja Hose Distributed Database Systems November 10, 2011 20 / 167
  • 6. Distributed Database Systems Distributed Database Systems Detour on centralized query processing Detour on centralized query processing Query transformation Query transformation Normalization Normalization Example SELECT EName Equivalence rules FROM Employees e, Assignment a p1 ∧ p2 ⇐⇒ p2 ∧ p1 and p1 ∨ p2 ⇐⇒ p2 ∨ p1 WHERE e.EID = a.ENo AND Duration ≥ 3 AND (PNo=’P1’ OR PNo=’P2’) p1 ∧ (p2 ∧ p3 ) ⇐⇒ (p1 ∧ p2 ) ∧ p3 and p1 ∨ (p2 ∨ p3 ) ⇐⇒ (p1 ∨ p2 ) ∨ p3 p1 ∧ (p2 ∨ p3 ) ⇐⇒ (p1 ∧ p2) ∨ (p1 ∧ p3 ) and Selection condition in disjunctive normal form p1 ∨ (p2 ∧ p3 ) ⇐⇒ (p1 ∨ p2) ∧ (p1 ∨ p3 ) (EID = ENo ∧ Duration ≥ 3 ∧ PNo=’P1’) ∨ ¬(p1 ∧ p2 ) ⇐⇒ ¬p1 ∨ ¬p2 and ¬(p1 ∨ p2 ) ⇐⇒ ¬p1 ∧ ¬p2 (EID = ENo ∧ Duration ≥ 3 ∧ PNo=’P2’) ¬(¬p1 ) ⇐⇒ p1 Selection condition in conjunctive normal form EID = ENo ∧ Duration ≥ 3 ∧ (PNo=’P1’ ∨ PNo=’P2’) Katja Hose Distributed Database Systems November 10, 2011 21 / 167 Katja Hose Distributed Database Systems November 10, 2011 22 / 167 Distributed Database Systems Distributed Database Systems Detour on centralized query processing Detour on centralized query processing Query transformation Query optimization Simple algebraic rewriting Workflow for centralized query processing Simple optimizations that are always beneficial regardless of system state Elimination of redundant predicates Simplification of expressions Unnesting of subqueries and views Tasks Recognize and simplify all expressions/operations/subqueries that are “obviously” unnecessary, redundant, or contradictory. Do not consider system state information, e.g., size of tables, existence of indexes, etc. Katja Hose Distributed Database Systems November 10, 2011 23 / 167 Katja Hose Distributed Database Systems November 10, 2011 24 / 167
  • 7. Distributed Database Systems Distributed Database Systems Detour on centralized query processing Detour on centralized query processing Query optimization Query optimization Query optimization Heuristics Steps Use simple heuristics which usually lead to better performance 1 Algebraic optimization Not the optimal plan is needed, but the really bad ones should be Find a good relational algebra operator tree avoided Heuristic query optimization Heuristics Cost-based query optimization Statistical query optimization Break selections Complex selection criteria should be broken into multiple parts 2 Physical optimization Push projection and push selection Find suitable algorithms for implementing the operations Cheap selections and projections should be performed as early as possible to reduce the sizes of intermediate results Force joins In most cases, using a join is much cheaper than using a Cartesian product and a selection Katja Hose Distributed Database Systems November 10, 2011 25 / 167 Katja Hose Distributed Database Systems November 10, 2011 26 / 167 Distributed Database Systems Distributed Database Systems Detour on centralized query processing Detour on centralized query processing Query optimization Query optimization Algebraic optimization rules Algebraic optimization rules Operator is commutative: r1 r2 ⇐⇒ r2 r1 Combinations of selections σ can be combined using logical and (∧). The Operator is associative: order of the selections is arbitrary: (r1 r2 ) r3 ⇐⇒ r1 (r2 r3 ) σF1 (σF2 (r1 )) ⇐⇒ σF1 ∧F2 (r1 ) ⇐⇒ σF2 (σF1 (r1 )) For operator π in combination with another operator π, the “outer” Exploiting commutativity of ∧ parameter dominates the “inner” one: πX (πY (r1 )) ⇐⇒ πX (r1 ) if X ⊆ Y Katja Hose Distributed Database Systems November 10, 2011 27 / 167 Katja Hose Distributed Database Systems November 10, 2011 28 / 167
  • 8. Distributed Database Systems Distributed Database Systems Detour on centralized query processing Detour on centralized query processing Query optimization Query optimization Algebraic optimization rules Algebraic optimization rules Operators σ and commute if all selection attributes are contained in the same relation: Operators π and σ commute if predicate F is defined based on the σF (r1 r2 ) ⇐⇒ σF (r1 ) r2 if attr(F ) ⊆ R1 projection attributes: A selection predicate can be split up in conjunction with a join (F = F1 ∧ F2 ) if the attributes referred to by F1 and F2 are contained in different relations: σF (πX (r1 )) ⇐⇒ πX (σF (r1 )) if attr(F ) ⊆ X σF (r1 r2 ) ⇐⇒ σF1 (r1 ) σF2 (r2 ) Alternatively, change in ordering possible if the projection is extended by if attr(F1 ) ⊆ R1 and attr(F2 ) ⊆ R2 all necessary attributes: In any case, part of a selection can be split up by separating predicates F1 πX1 (σF (r1 )) ⇐⇒ πX1 (σF (πX1 ,X2 (r1 ))) if attr(F ) ⊇ X2 referencing attributes of R1 only, F2 contains the remaining predicates referencing attributes of both relations σF (r1 r2 ) ⇐⇒ σF2 (σF1 (r1 ) r2 ) if attr(F1 ) ⊆ R1 Katja Hose Distributed Database Systems November 10, 2011 29 / 167 Katja Hose Distributed Database Systems November 10, 2011 30 / 167 Distributed Database Systems Distributed Database Systems Detour on centralized query processing Detour on centralized query processing Query optimization Query optimization Algebraic optimization rules Algebraic optimization rules Commutativity of σ and ∪: Commutativity of π and : σF (r1 ∪ r2 ) ⇐⇒ σF (r1 ) ∪ σF (r2 ) πX (r1 r2 ) ⇐⇒ πX (πY1 (r1 ) πY2 (r2 )) Commutativity of σ and −: with Y1 = (X ∩ R1 ) ∪ (R1 ∩ R2 ) σF (r1 − r2 ) ⇐⇒ σF (r1 ) − σF (r2 ) and or in case F only references tuples in r1 : Y2 = (X ∩ R2 ) ∪ (R1 ∩ R2 ) σF (r1 − r2 ) ⇐⇒ σF (r1 ) − r2 Pushing a projection is possible if all Yi are defined in such a way that they preserve all attributes necessary to perform the join. Katja Hose Distributed Database Systems November 10, 2011 31 / 167 Katja Hose Distributed Database Systems November 10, 2011 32 / 167
  • 9. Distributed Database Systems Distributed Database Systems Detour on centralized query processing Detour on centralized query processing Query optimization Query optimization Algebraic optimization rules Heuristic algebraic optimization – Example Further rules Commutativity of π and ∪: πX (r1 ∪ r2 ) ⇐⇒ πX (r1 ) ∪ πX (r2 ) Use algebraic optimization heuristics Distributive law for and ∪, distributive law for and −, Commutativity of renaming β with other operators, . . . Force join Idempotence, e.g., A ∨ A ⇐⇒ A Push selection and projection Operations involving empty relations Commutative and associative laws for , ∪ und ∩ Katja Hose Distributed Database Systems November 10, 2011 33 / 167 Katja Hose Distributed Database Systems November 10, 2011 34 / 167 Distributed Database Systems Distributed Database Systems Detour on centralized query processing Detour on centralized query processing Query optimization Query optimization Cost-based algebraic query optimization Physical query optimization Physical optimization Most non-distributed RDBMS strongly rely on cost-based optimizations Input: Aim for better optimized plan with respect to system and data Optimized query plan consisting of algebra operators characteristics Choose an algorithm to compute a particular algebra operator Join order optimization Join: Basic approach Block-Nested-Loop join, hash join, merge join, . . . Establish a cost model for various operations Enumerate all query plans and compute costs Select: Pick the best query plan Full table scan, index lookup, ad-hoc index generation & lookup, . . . Usually, dynamic programming techniques are used to keep Tasks computational effort manageable Translating a query plan into an execution plan Physical and algebraic optimization are often interleaved Katja Hose Distributed Database Systems November 10, 2011 35 / 167 Katja Hose Distributed Database Systems November 10, 2011 36 / 167
  • 10. Distributed Database Systems Distributed Database Systems Detour on centralized query processing Basics of distributed query processing Query optimization Query optimization example 1 Motivation 2 Detour on centralized query processing Translating SQL into relational algebra Phases of centralized query processing Output: query execution plan Query parsing Query transformation Query optimization 3 Basics of distributed query processing Phases of distributed query processing Introduction Meta data management Data localization 4 Global query optimization Main questions Global query optimizer Distributed cost model Katja Hose Distributed Database Systems November 10, 2011 37 / 167 Katja Hose Distributed Database Systems November 10, 2011 38 / 167 Distributed Database Systems Distributed Database Systems Basics of distributed query processing Basics of distributed query processing Phases of distributed query processing Join order optimization Workflow for distributed query processing Total time models Response time models 5 Summary Katja Hose Distributed Database Systems November 10, 2011 39 / 167 Katja Hose Distributed Database Systems November 10, 2011 40 / 167
  • 11. Distributed Database Systems Distributed Database Systems Basics of distributed query processing Basics of distributed query processing Introduction Introduction Basic considerations Basic considerations Costs are more difficult to predict Distributed query processing Join selectivity: is it worthwhile to push down a selection? Shares the same properties of centralized query processing Data is distributed: difficult to get meaningful statistics Similar problem but with different objectives and constraints Network latency is very hard to predict Objectives for centralized query processing Current workload at nodes, load shedding Minimize the number of disk accesses Additional cost factors and constraints Minimize computational time Extension of relational algebra (sending/receiving data) Objectives for distributed query processing Data localization (which node holds relevant data) Minimize resource consumption Replication and caching (where to compute an operation) Minimize response time Network models Maximize throughput Response-time models Data and structural heterogeneity (federated databases . . . ) Katja Hose Distributed Database Systems November 10, 2011 41 / 167 Katja Hose Distributed Database Systems November 10, 2011 42 / 167 Distributed Database Systems Distributed Database Systems Basics of distributed query processing Basics of distributed query processing Introduction Introduction Consequences Example Query Optimization is much more difficult than in the central case Return the names of all employees working for project ’P1’ Statistics and costs change over time, e.g., workload at a node, network load πEN ame (πEID,EN ame (Employees) Employees.EID=Assignment.EN o πEN o (σP N o= P 1 (Assignment))) More conflicting optimization goals Increase throughput → reduce replication and parallelization, Problems increase query response time → increase parallelization Relations are fragmented and distributed among five nodes More cost factors and constraints The Employees relation uses primary horizontal fragmentation Consequences One fragment located at node 1, the other at node 2, no replication Adaptive query plans (create an initial plan and optimize it on-the-fly) The Assignment relation uses derived horizontal fragmentation One fragment located at node 3, the other at node 4, no replication Do not aim for the best plan, but for a good plan The query originates from node 5 Katja Hose Distributed Database Systems November 10, 2011 43 / 167 Katja Hose Distributed Database Systems November 10, 2011 44 / 167
  • 12. Distributed Database Systems Distributed Database Systems Basics of distributed query processing Basics of distributed query processing Introduction Introduction Example Example Cost model and statistics Accessing a tuple costs 1 unit (acc) Transferring a tuple costs 10 units (trans) There are 400 employees and 1000 assignments 20 assignments for project ‘P1’ All tuples are uniformly distributed, i.e., nodes 3 and 4 provide 10 assignments for project ‘P1’ each There are local indexes on attribute P N o at nodes 3 and 4 (as well as indexes on primary keys at all nodes) Direct tuple access is possible on local sites, no scanning All nodes can directly communicate with each other Simplification: no costs for unions and projections Katja Hose Distributed Database Systems November 10, 2011 45 / 167 Katja Hose Distributed Database Systems November 10, 2011 46 / 167 Distributed Database Systems Distributed Database Systems Basics of distributed query processing Basics of distributed query processing Introduction Introduction Example Example Simple execution plan - Version B Simple execution plan - Version A Ship intermediate results Transfer all data to Node 5 Katja Hose Distributed Database Systems November 10, 2011 47 / 167 Katja Hose Distributed Database Systems November 10, 2011 48 / 167
  • 13. Distributed Database Systems Distributed Database Systems Basics of distributed query processing Basics of distributed query processing Introduction Introduction Example Example Costs plan B: 440 units Costs plan A: 23.000 units Katja Hose Distributed Database Systems November 10, 2011 49 / 167 Katja Hose Distributed Database Systems November 10, 2011 50 / 167 Distributed Database Systems Distributed Database Systems Basics of distributed query processing Basics of distributed query processing Introduction Introduction Important aspects of distributed query processing Important aspects of distributed query processing Meta data management Data localization Global query optimization Post-processing Katja Hose Distributed Database Systems November 10, 2011 51 / 167 Katja Hose Distributed Database Systems November 10, 2011 52 / 167
  • 14. Distributed Database Systems Distributed Database Systems Basics of distributed query processing Basics of distributed query processing Meta data management Meta data management Workflow for distributed query processing Meta data management Prerequisites to perform query optimization Meta data must be available Meta data is stored in the catalog Catalog provides information about the data distribution Use this information to decide, for instance, if it is worthwhile to execute a selection very early. Katja Hose Distributed Database Systems November 10, 2011 53 / 167 Katja Hose Distributed Database Systems November 10, 2011 54 / 167 Distributed Database Systems Distributed Database Systems Basics of distributed query processing Basics of distributed query processing Meta data management Meta data management Meta data management Meta data management Typical contents of a catalog for distributed database management systems Database schema Where to store the catalog in a distributed system? Definitions of tables, views, constraints, keys,. . . Central node Partitioning schema Simple solution, bottleneck Information about how the schema is partitioned and how tables can Replicated at all nodes be reconstructed Updates are expensive Allocation schema Fragmented Information about which fragment can be found at which node In rare cases, the catalog may become very large (including information about replication) Catalog has to be fragmented and allocated Network information Caching Information about node connections, network model Replicate only needed parts of a central catalog, anticipate potential Additional physical information inconsistencies Information about indexes, data statistics (histograms, etc.), hardware resources (processing & storage),. . . Katja Hose Distributed Database Systems November 10, 2011 55 / 167 Katja Hose Distributed Database Systems November 10, 2011 56 / 167
  • 15. Distributed Database Systems Distributed Database Systems Basics of distributed query processing Basics of distributed query processing Meta data management Meta data management Meta data management Meta data management Centralized catalog Replicated catalog One instance of the global catalog at a central node Full copy of the global catalog at each node Advantages Advantages No need to update copies Little communication overhead for queries Little memory consumption Good availability Disadvantages Disadvantages Communication with central node for each query High update costs Central node potentially represents a bottleneck Katja Hose Distributed Database Systems November 10, 2011 57 / 167 Katja Hose Distributed Database Systems November 10, 2011 58 / 167 Distributed Database Systems Distributed Database Systems Basics of distributed query processing Basics of distributed query processing Meta data management Meta data management Meta data management Meta data management Fragmented catalog Caching catalog data Partitioning the global catalog and assigning partitions to nodes Caching non-local catalog data Advantages Advantages Sharing load among nodes Avoiding remote access to frequently needed catalog data Reducing update overhead Reducing communication overhead Disadvantages Disadvantages Localizing necessary partitions of the global catalog Coherency control Invalidating cached copies in the presence of updates Katja Hose Distributed Database Systems November 10, 2011 59 / 167 Katja Hose Distributed Database Systems November 10, 2011 60 / 167
  • 16. Distributed Database Systems Distributed Database Systems Basics of distributed query processing Basics of distributed query processing Meta data management Data localization Meta data management Workflow for distributed query processing Caching catalog data Explicit invalidation Owner of catalog data remembers nodes with local copies In case of updates: sending an invalidation message to nodes with local copies Implicit invalidation Identifying old catalog data during runtime (adding version numbers and time stamps to query messages) Katja Hose Distributed Database Systems November 10, 2011 61 / 167 Katja Hose Distributed Database Systems November 10, 2011 62 / 167 Distributed Database Systems Distributed Database Systems Basics of distributed query processing Basics of distributed query processing Data localization Data localization Data localization Example – horizontal reduction Objective Schema Creating subqueries in consideration of the data distribution Projects1 = σBudget≤150.000 (Projects) Projects2 = σ150.000<Budget≤200.000 (Projects) Assumptions Projects3 = σBudget>200.000 (Projects) Fragmentation is defined by fragmentation expressions Reconstruction expression (horizontal fragmentation) Each fragment is allocated only at one node (no replication) Projects = Projects1 ∪ Projects2 ∪ Projects3 Fragmentation expressions and locations of the fragments are stored Example query in the catalog σLocation= Saarbr. ∧Budget≤100.000 (Projects) Main tasks After replacing references to global relations Replace access to global relations with accesses to the fragments σLocation= Saarbr. ∧Budget≤100.000 (Projects1 ∪ Projects2 ∪ Insert reconstruction expression into algebra query Projects3 ) Basic algebraic simplifications of the query Further optimization is possible! Katja Hose Distributed Database Systems November 10, 2011 63 / 167 Katja Hose Distributed Database Systems November 10, 2011 64 / 167
  • 17. Distributed Database Systems Distributed Database Systems Basics of distributed query processing Basics of distributed query processing Data localization Data localization Query simplification – horizontal reduction Example – horizontal reduction Objective Query with fragmentation expression Eliminate non-necessary subqueries σLocation= Saarbr. ∧Budget≤100.000 (Projects1 ∪ Projects2 ∪ Projects3 ) Horizontal reduction rule Fragment definitions Projects1 = σBudget≤150.000 (Projects) Given fragments of R as FR = {R1 , . . . , Rn } with Ri = σpi (R) Projects2 = σ150.000<Budget≤200.000 (Projects) All fragments Ri for which σps (Ri ) = ∅ can be removed Projects3 = σBudget>200.000 (Projects) with ps denoting the query’s selection predicate Because of σps (Ri ) = ∅ ⇐ ∀x ∈ R : ¬(ps (x) ∧ (pi (x)) σBudget≤100.000 (Projects2 ) = ∅, σBudget≤100.000 (Projects3 ) = ∅ The selection with the query predicate ps on fragment Ri is empty if ps contradicts the fragmentation predicate pi of Ri , i.e., ps and pi are We obtain the reduced query never true at the same time for all tuples in Ri σLocation= Saarbr. (σBudget≤100.000 (Projects1 )) Katja Hose Distributed Database Systems November 10, 2011 65 / 167 Katja Hose Distributed Database Systems November 10, 2011 66 / 167 Distributed Database Systems Distributed Database Systems Basics of distributed query processing Basics of distributed query processing Data localization Data localization Query simplification – join reduction Example – join reduction Join Reductions Schema Larger joins are replaced by multiple partial joins on fragments Projects(PNo, PName, Budget, Location) Distributive law: (R1 ∪ R2 ) S = (R1 S) ∪ (R2 S) Projects1 = σP N o= P 1 ∨P N o= P 2 (Projects) Projects2 = σP N o= P 3 (Projects) Eliminate all union fragments that will return an empty result Projects3 = σP N o= P 4 (Projects) Expectations Assignment(ENo, PNo, Duration) Elimination of partial joins producing empty results Assignment1 = σP N o= P 1 ∨P N o= P 2 (Assignment) Depends on fragmentation optimality Assignment2 = σP N o= P 3 ∨P N o= P 4 (Assignment) Many joins on small relations have lower resource costs than one large Example query join Depends on fragmentation and applied join algorithms select * from Projects p, Assignment a where p.PNo = a.PNo Smaller joins can be executed in parallel In relational algebra Might decrease response time but might also increase communication Projects Assignment costs Katja Hose Distributed Database Systems November 10, 2011 67 / 167 Katja Hose Distributed Database Systems November 10, 2011 68 / 167
  • 18. Distributed Database Systems Distributed Database Systems Basics of distributed query processing Basics of distributed query processing Data localization Data localization Example – join reduction Query simplification – join reduction Query Projects Assignment Join reduction rule Given fragments of R as FR = {R1 , . . . , Rn } and fragments of S as After replacing global relations with reconstruction expressions FS = {S1 , . . . , Sn } (Projects1 ∪ Projects2 ∪ Projects3 ) (Assignment1 ∪ Assignment2 ) Apply distributive law, e.g.: (R1 ∪ R2 ) (S1 ∪ S2 ) = (R1 S1 ) ∪ (R1 S2 ) ∪ (R2 S1 ) ∪ (R2 S2 ) After applying the distributive law All partial joins between fragments Ri and Sj for which Ri Sj = ∅ can be removed (Projects1 Assignment1 ) ∪ (Projects1 Assignment2 ) ∪ Ri Sj = ∅ ⇐ ∀x ∈ Ri , y ∈ Sj : ¬(pi (x) ∧ pj (y)) (Projects2 Assignment1 ) ∪ (Projects2 Assignment2 ) ∪ The join between fragments Ri and Rj is empty if their respective (Projects3 Assignment1 ) ∪ (Projects3 Assignment2 ) fragmentation predicates (on the join attribute) contradict, i.e., there is no tuple combination x and y such that both partitioning Further optimization is possible! predicates are fulfilled at the same time. Katja Hose Distributed Database Systems November 10, 2011 69 / 167 Katja Hose Distributed Database Systems November 10, 2011 70 / 167 Distributed Database Systems Distributed Database Systems Basics of distributed query processing Basics of distributed query processing Data localization Data localization Example – join reduction Query simplification – join reduction for horizontal fragmentation Query with fragmentation expression (Projects1 Assignment1 ) ∪ (Projects1 Assignment2 ) ∪ The easiest join reduction case follows from derived horizontal (Projects2 Assignment1 ) ∪ (Projects2 Assignment2 ) ∪ fragmentation (Projects3 Assignment1 ) ∪ (Projects3 Assignment2 ) For each fragment of the first relation, there is exactly one matching fragment of the second relation Some of these partial joins are empty, e.g.: Simply use the information contained in the reconstruction expression Projects1 Assignment2 = ∅ instead of comparing the reconstruction predicates to each other Because their fragmentation expressions contradict: Join reduction for arbitrary horizontal partitioning might not be beneficial Projects1 = σP N o= P 1 ∨P N o= P 2 (Projects) and Assignment2 = σP N o= P 3 ∨P N o= P 4 (Assignment) Reduced query (Projects1 Assignment1 ) ∪ (Projects2 Assignment2 ) ∪ (Projects3 Assignment2 ) Katja Hose Distributed Database Systems November 10, 2011 71 / 167 Katja Hose Distributed Database Systems November 10, 2011 72 / 167
  • 19. Distributed Database Systems Distributed Database Systems Basics of distributed query processing Basics of distributed query processing Data localization Data localization Query simplification – join reduction for derived Query simplification – join reduction for derived horizontal fragmentation horizontal fragmentation Example After replacing global relations with reconstruction expressions Projects(PNo, PName, Budget, Location) (Projects1 ∪ Projects2 ) (Assignment1 ∪ Assignment2 ) Projects1 = σP N o= P 1 ∨P N o= P 2 (Projects) Projects2 = σP N o= P 3 ∨P N o= P 4 (Projects) After applying the distributive law Assignment(ENo, PNo, Duration) (Projects1 Assignment1 ) ∪ (Projects1 Assignment2 ) ∪ Assignment1 = Assignment Projects1 (Projects2 Assignment1 ) ∪ (Projects2 Assignment2 ) Assignment2 = Assignment Projects2 Reduced query (using information about fragmentation of relation Assignment directly) Query in relational algebra Projects Assignment (Projects1 Assignment1 ) ∪ (Projects2 Assignment2 ) Katja Hose Distributed Database Systems November 10, 2011 73 / 167 Katja Hose Distributed Database Systems November 10, 2011 74 / 167 Distributed Database Systems Distributed Database Systems Basics of distributed query processing Basics of distributed query processing Data localization Data localization Query simplification – vertical reduction Example – vertical reduction Schema Projects(PNo, PName, Budget, Location) Projects1 = πP N o,P N ame,Location (Projects) Projects2 = πP N o,Budget (Projects) Vertical fragmentation rule Reconstruction expression Given fragments of R as FR = {R1 , . . . , Rn } with Ri = πβi (R) with Projects = Projects1 Projects2 βi representing the enumeration of a subset of R’s attributes Avoid joining fragments containing “useless” attributes, i.e., Example query fragments containing only attributes that are not referenced in the πP N ame (Projects) query and not output in the result After replacing references to global relations πP N ame (Projects1 Projects2 ) After removing unnecessary fragments πP N ame (Projects1 ) Katja Hose Distributed Database Systems November 10, 2011 75 / 167 Katja Hose Distributed Database Systems November 10, 2011 76 / 167
  • 20. Distributed Database Systems Distributed Database Systems Basics of distributed query processing Basics of distributed query processing Data localization Data localization Query simplification – hybrid fragmentation Qualified relations Supporting algebraic optimization of queries involving fragments Annotating fragments and intermediate relations with predicates Estimating the size of a relation The reconstruction expression introduces combinations of joins and Extension of relational algebra unions General guidelines Definition: qualified relation Remove empty relations generated by contradicting relations on A qualified relation is a pair [R : qR ] where R is a relation and qR is a horizontal fragments predicate. Remove useless relations generated by vertical fragments Break and distribute joins, eliminate empty fragment joins Example Representing horizontal fragments as qualified relations where the qualification predicate corresponds to the fragmentation expression [Projects : σP N o= P 1 ∨P N o= P 2 ] Katja Hose Distributed Database Systems November 10, 2011 77 / 167 Katja Hose Distributed Database Systems November 17, 2011 78 / 167 Distributed Database Systems Distributed Database Systems Basics of distributed query processing Basics of distributed query processing Data localization Data localization Qualified relations Qualified relations Example query σ100.000≤Budget≤200.000 (Projects) Extended relational algebra Qualified relations E1 = σ100.000≤Budget≤200.000 [Projects1 : Budget ≤ 150.000] (1) E := σF [R : qR ] → [E : F ∧ qR ] [E1 : (100.000 ≤ Budget ≤ 200.000) ∧ (Budget ≤ 150.000)] (2) E := πA [R : qR ] → [E : qR ] [E1 : 100.000 ≤ Budget ≤ 150.000] (3) E := [R : qR ] × [S : qS ] → [E : qR ∧ qS ] (4) E := [R : qR ] − [S : qS ] → [E : qR ] E2 = σ1000≤Budget≤200.000 [Projects2 : 150.000 < Budget ≤ 200.000] (5) E := [R : qR ] ∪ [S : qS ] → [E : qR ∨ qS ] [E2 : (100.000 ≤ Budget ≤ 200.000) ∧ (6) E := [R : qR ] F [S : qS ] → [E : qR ∧ qS ∧ F ] (150.000 < Budget ≤ 200.000)] [E2 : 150.000 < Budget ≤ 200.000] E3 = σ100.000≤Budget≤200.000 [Projects3 : Budget > 200.000] [E3 : (100.000 ≤ Budget ≤ 200.000) ∧ (Budget > 200.000)] E3 = ∅ Katja Hose Distributed Database Systems November 17, 2011 79 / 167 Katja Hose Distributed Database Systems November 17, 2011 80 / 167
  • 21. Distributed Database Systems Distributed Database Systems Global query optimization Global query optimization 1 Motivation Join order optimization Total time models 2 Detour on centralized query processing Response time models Translating SQL into relational algebra Phases of centralized query processing Query parsing Query transformation Query optimization 3 Basics of distributed query processing Phases of distributed query processing Introduction 5 Summary Meta data management Data localization 4 Global query optimization Main questions Global query optimizer Distributed cost model Katja Hose Distributed Database Systems November 17, 2011 81 / 167 Katja Hose Distributed Database Systems November 17, 2011 82 / 167 Distributed Database Systems Distributed Database Systems Global query optimization Global query optimization Main questions Main questions Workflow for distributed query processing Introduction to global query optimization Main questions When to optimize? What criteria to optimize? Where to execute the query? Katja Hose Distributed Database Systems November 17, 2011 83 / 167 Katja Hose Distributed Database Systems November 17, 2011 84 / 167
  • 22. Distributed Database Systems Distributed Database Systems Global query optimization Global query optimization Main questions Main questions When to optimize? When to optimize? Full compile time optimization Fully dynamic optimization The full query execution plan is computed at compile time Each query is optimized individually at runtime Assumption This technique heavily relies on heuristics, learning algorithms, and Applications use canned queries luck Prepared and parameterized SQL statements Pros Pros Might produce very good plans Queries can be executed directly Uses current network state Cons Also usable for ad-hoc queries Complex to model Cons Much information unknown or too expensive to gather Result quality might be very unpredictable Collecting statistics on all nodes? Complex algorithms and heuristics Statistics outdated Difficult to keep statistics up-to-date Especially machine load and network properties are very volatile Katja Hose Distributed Database Systems November 17, 2011 85 / 167 Katja Hose Distributed Database Systems November 17, 2011 86 / 167 Distributed Database Systems Distributed Database Systems Global query optimization Global query optimization Main questions Main questions When to optimize? When to optimize? Semi-dynamic optimization Hierarchical optimization Pre-optimize the query Plans are created in multiple stages During query execution, test if execution runs as expected during Global-Local-Plans optimization Global query optimizer creates a global query plan e.g., are tuples/fragments delivered in time?, does the network adhere Focus on data transfer: which intermediate results are to be computed by which node? How should intermediate results be shipped? to the predicted properties?, are there any bad network latencies?, etc. Local query optimizers create local query plans If execution shows severe deviations, compute a new query plan for all Decide on query plan layout, algorithms, indexes, etc. to deliver the parts that have not yet been executed requested intermediate result Makes only sense for queries that run for a longer time Two-Step-Plans Katja Hose Distributed Database Systems November 17, 2011 87 / 167 Katja Hose Distributed Database Systems November 17, 2011 88 / 167
  • 23. Distributed Database Systems Distributed Database Systems Global query optimization Global query optimization Main questions Main questions When to optimize? What criteria to optimize? Hierarchical optimization Important aspects for global optimization Plans are created in multiple stages Communication operators Global-Local-Plans Two-Step-Plans Fragment cardinalities During compile time, only stable parts of the plan are computed Order of operations Join order, join methods, access paths, etc. Join ordering During query execution, all missing plan elements are added Because permutations of the joins within the query may lead to Node selection, transfer policies, etc. Both steps can be performed using traditional query optimization improvements of orders of magnitude techniques Most important alternative optimization criteria Plan enumeration with dynamic programming Complexity is manageable as each optimization problem is much easier Query response time than a full optimization Resource consumption During runtime optimization, fresh statistics are available Total query execution costs Most distributed database management systems use semi-dynamic or hierarchical optimization techniques (or both) ... Katja Hose Distributed Database Systems November 17, 2011 89 / 167 Katja Hose Distributed Database Systems November 17, 2011 90 / 167 Distributed Database Systems Distributed Database Systems Global query optimization Global query optimization Main questions Main questions Where to execute the query? Global query optimization Global query optimization. . . Query optimizer has to decide which parts of the query have to be . . . deals with finding the “best” ordering of operations in the query shipped to which node (cost model) (extended by fragmentation expressions and including communication operations) that minimizes a cost function. In heavily replicated scenarios, clever hybrid shipping can effectively be used for load balancing Input Move expensive computations to lightly loaded nodes, avoid an algebraic query extended by fragmentation expressions expensive communication Output an algebraic query or query execution plan with communication operations Katja Hose Distributed Database Systems November 17, 2011 91 / 167 Katja Hose Distributed Database Systems November 17, 2011 92 / 167