Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • \n
  • This is basically the summary of the entire paper\n
  • \n
  • \n
  • \n
  • Result: algebraic query tree that optimizes the order in which operators must be applied. \nTree A is not the best plan because the selection operation is applied before the join.\nTree B is the optimal algebraic plan because all selection and projection operations are applied as soon as possible\n\n
  • In an ubiquitous environment, there are no global views because it’s expensive!\n\n
  • Given: Algebraic Tree (from logical optimized) \nResult: All corresponding execution plans that specify the implementation of each algebraic operator\n\n
  • - Classical query optimization techniques typically generate execution plans that are optimized according to a single dimension, query execution time.\n- Useful knowledge must be obtained from previously executed queries and be managed and exploited by means of automatic learning techniques\n- GOAL: improve or acquire new capabilities from experience related to some specific tasks\n- Query evaluation time is no longer the main optimization objective\n
  • Given a new query Q, an existent query plan is retrieved if it can be adapted to Q. Also, it is required to verify if it is possible to accomplish its execution with the computational resources available at the moment of query execution (mem, CPU, energy)\n
  • \n
  • \n
  • \n
  • It is also necessary to pay attention on the computational resources consumed by the query and those that are available at the moment that the new query will be executed as well as in the optimization objective that can change each time the query is executed.\n
  • \n
  • \n
  • \n
  • The operator and the attribute value are not important to determine the operation family to which a specific operation pertains, the important knowledge is related to the operation type and the attribute(s) included in the operations\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Similarity of a and b is defined in terms of features common to a and b\nminus the features that pertain to a but not to b\nand those that pertain to b but not to a\ntheta, alpha, beta : non-negative valued parameters that determine the relative weight of the three components of similarity\n- provide the flexibility when modifying the importance of similarities or differences accdng to area of application\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • 20110516_ria_ENC

    1. 1. Query Optimization UsingCase-based Reasoning inUbiquitous Environments Lourdes Angelica Martinez-Medina Christophe Bibineau Jose Luis Zevhinelli-Martini 2009 Mexican International Conference on Computer Science (ENC 09) 2011/05/16 - Ria Mae Borromeo
    2. 2. Introduction Query Optimization  Rely on cost models that are dependent on metadata (statistics, cardinality estimates)  Typically restricted to execution time estimation Problem  There are computational environments where metadata acquisition and support are expensive.  i.e. Ubiquitous environments Proposed Solution  Query Optimization technique based on learning, particularly case-based reasoning 2
    3. 3. Ubiquitous Environment Integrates information from different computational tools and application Characteristics 1. Heterogeneity ( ) • extensive range of computational resources and electronic devices • devices have different physical and logical characteristics 2. Dynamicity ( ) • resources change continuously due to mobility • communication network properties and the resources that interact with it vary 3
    4. 4. Ubiquitous Environment3. Distribution ( ) • resources are distributed within a physical space thus information used by these resources are also distributed4. Autonomy ( ) • resources can change their availability status anytime6. Physical Constraints ( ) • i.e.: processing and storage capability, energy consumption, location7. Metadata lack ( ) • Constant changes --> Expensive maintenance --> No global schema 4
    5. 5. ill be available again. is composed by three phases: logical, global, and physical s. Resources present physical lim- Logical and physical optimization phases are related to cen Classical Query Optimizationain their appropriate operation, e.g. rage capability, energy consump- tralized environments. Global optimization is required in distributed environment. Figure 1 illustrates the optimizationng others. A device or a process is phases of the typical optimization process. e a task only if it counts with the  Evaluation cost models used ational resources. It is convenient for most of classical querysk performance based on specific optimization techniques are he resource characteristics previ- tightly tied to metadatamake difficult the acquisition and use.tadata like cardinality and statisticsalues. There is not a global schema  Each phase requires utational environments, its mainte- nsive different constant changes due to the metadata types and has differentational environments metadata ac- optimization objectives ce is very expensive. Ubiquitous must provide a set of methods tom available resources. The proper- ources in ubiquitous environment s for query processing. Some of Figure 1. Phases of the optimization process metadata required for estimatingxecution plans (possible execution 5esults of a query) as a result of
    6. 6. Classical Query Optimization Logical Optimization  Aims to reduce the number of tuples combined as intermediate results  Appropriate order for applying selection, projection and join operators must be decided  Uses heuristics and metadata  Result: Figure 2. Algebraic query trees 6
    7. 7. Classical Query Optimization Global Optimization  Aims to minimize communication cost related to interactions among resources and a set of views  Global optimizer: decides where to perform each part of the execution tree  Result: new execution tree with communication operators 7
    8. 8. Classical Query Optimization  Physical Optimization  Aims to reduce disk access for retrieving requested data and minimize execution time for executing query plans  Metadata related to execution context is required Figure 2. Algebraic query trees Algebraic query trees 8timization Figure 3. Query execution plan
    9. 9. Contribution of the Paper Proposes a query optimization technique for ubiquitous environments Allows query optimization according to user requirements Query optimization based on learning  Goal: Improve or acquire new capabilities rom experience related some specific tasks 9
    10. 10. Query Optimization Based on Learning Learn from past experience!  Experience : the knowledge gained from a problem resolution  Learning : the acquisition of knowledge in order to improve the behavior or to acquire new capabilities from previous experiences  Machine Learning : a sub-discipline of AI that is in-charge of designing and developing methods that allow computers to automatically learn in order to improve or create specific capabilities 10
    11. 11. Case-based Reasoning Proposes a reasoning process that aims to solve new problems using the experience gained when similar problems are solved Case minimum unit of reasoning Problem Description Solution Set of annotations that describe how the solution was derived 11
    12. 12. consists of (i) a problem description, (ii) its correspondentsolution, and, (iii) a set of annotations that describe how s Case-based Reasoning Processthe solution was derived. Case based reasoning has been tformalized as a four-step process: retrieve, reuse, review andretain [7].(4) Store as a new (1) Get relevant casescase in the memory (2) Adjust the solution(3) New solution must of the relevant casebe verified in the real to the problemworld (simulation) Figure 4. Case-based reasoning process 12
    13. 13. Case-based Reasoning Adaptation to Query Optimization Adapts case-based reasoning to provide optimal execution plans for new queries Uses the knowledge acquired from experience to optimize and execute similar queries The solution is represented by the current execution plan: 1. Query 2. Problem 3. Case 4. Reasoning Process 13
    14. 14. to solve new The whereClause specifies the set of conditions (for datamilar problemsf reasoning. It 1. Query selection and data combination or join) that must be verified by the data to form part of the query result. correspondent Figure 5 illustrates the model that we propose for repre- describe how  Modular part of knowledge in the definition of and join operations are senting a query. In a query, selection a problem & case ning haspiece of knowledge that links amost frequent. the existing  The been the most important and problem withuse, cases and review  selectClause  fromClause  whereClause Query Representation (UML Diagram) Figure 5. Query representation (UML diagram)ss 14
    15. 15. 1. Query Query Operation  Type  Select condition(atttexp, cnstexp)  Join condition(attrexp.a, attrexp.b)  Set of attributes  Specific Condition Q = {O1, O2, O3, O4 } SELECT Rest.nom FROM Resto, Ville, Region WHERE Region.nom = ‘RA’ O1 AND Resto.spec = ‘IT’ O2 AND Resto.vil = Ville.nom O3 AND Ville.numDep = Region.numDep O4 15
    16. 16. We propose the concept of operation family in ord 1. Query group operations that include the same condition applie the same attributes and for this reason, the same relatTwo operations ox and oy pertain to the same oper Operation Family family if they associated to asame operation families or join)  All queries are are of the set of type (selection  Used to group operations that include the same condition involve thethe sameattributesand sameof them must pertain applied to same attributes (each relations theTwo operations Ox and Oy respectively). An operation fami  same data source are from the same operation family if: represented as follows:  same operation type (selection or join)  same attributes (1) R.an = {on | on = condition(R.an ,value)} an attribute that pertains The operation set operations family is composed by R.an the relation R tooperations set on with a condition of thecondition(R.an , value), where an is an attribute 16
    17. 17. of all possible comparison operators: Equal, EqualOrLower, set. These operations are members of different operation the T p Lower, GreaterOrEqual, Greater and Different. All the families: R1.a1 , R2.a2 and R1.a3,R2.a4 . Equation (2) inclu 1. Query queries are associated to a set of operation families. The shows the operationa familiesQ is that are associated to each Q defined by an operations with unde whereClause of query simi requi operations in Q. set. These operations are members of different operation solv  The whereClause ,of a query Q is defined by. an operations set Th families: R1.a1 R2.a2 and R1.a3,R2.a4 Equation (2) within of (On) Q the { shows = operation families Q that are ,associated to } (2) R1.a1 , R2.a2 , R1.a3,R2.a4 R2.a4,R3.a5 each com simil  These operations are members of different operation families operations in Q. that solve  Operation families associated to each operation in Q Each different combination of operation families R.an of int exec conforms a = { R1.a1 , R2.a2 , i.e. the class R2.a4,R3.a5 } by comp (2) Q class description, R1.a3,R2.a4 , Cn defined chan Class operation families in (3). The queries are classified in a the Description (Cn) that2a set ofEach different combinationoperation families mustR.an Each different combination of of operation families classes. execu conform text to conforms a class description, i.e. the class Cn defined by this. chang the operation families in (3). The queries are classified in a Figu (3) Cn = { Rn.an , Rm.am , Rn.ap,Rm.aq , R2.a4,R3.a5 } 2) set of classes. text e  composed of all queries that contain at least one operation Figur that (3)class=Cto is composed specified families that contain The Cn {n Rn.an ,ofRm.amby Rn.ap,Rm.aq ,Qn pertains each the , all queries R2.a4,R3.a5 } at least one operation that pertains to each of the specified families as definedisin (4). Thisby all queries Qn Qn pertains The class Cn composed means, a query that contain 17
    18. 18. The class Cn is composed by all queries Qn that contain at least one opat least one operation that pertains to each of the specified families as defi 1. Queryfamilies as defined in (4). This means, a query Qn pertains to the class Cn selection C if and only ifpertains operation family familyto the class n operation o2 for all to operation that describes Cthat  Qn, pertains,to operationCnoif andnonlyQnto operation family is of R2.a2 the Cn exists class describes join the an operation o in if for all operation family 3 pertains such as this operationoperation is of the, form nofthe operation family n o4 Cn such pertains F that describes C , exists an operation O in that as this R2.a4,R1.a3 and the join operation . to the operation the form of the operation family F (4) Qand Cn i operation is of family R1.a1,R3.a6 . The operator n ∈ (4) Qnattribute (∀ Rn.an ∈ not) ∃ ((on ∈ Qn ) ∧ determine the the ∈ Cn iff value are Cn important to (on ∈ Rn.an )) Rn.an )) operation family to which a specific operationVille Relation R1 pertains, Q = {O1, O2, O3, O4 } the important knowledge is related to a1 the operation to According numReg According to the query Q presented above, the selection operation o1 Fi type and the attribute(s) included in the a2 SELECT Rest.nom operation. The spec p FROM o1 pertains to operation familyoperationResto, Ville, Region R3.a5 , the nom a3 operation families ‘RA’ WHERE Region.nom = described before make a4 O1 up a class a). vil Any Resto.spec = ‘IT’composed by operations that pertain AND query that is O2 AND Resto.vil = Ville.nom Relation R2 Resto toAND Ville.numDep = Region.numDep pertains to the same !!! b). the families described before O3 a5 class nom O4 a6 num a) C = { R3.a5 , R2.a2 , R2.a4,R1.a3 and R1.a1,R3.a6 b) q ∈ C iff (∀ Rn.an ∈ Cn )∃((on ∈q)∧(on ∈ Rn.an )) 18
    19. 19. computational resources consumed by the query and those that are available at the moment that the new query will beny 2. Problem executed as well as in the optimization objective that can changes each time the query is executed.a 2) Problem: A problem is composed by a query, a con- text execution representation, and an optimization objective.  Specifies an optimized query, optimization parameters and measures illustrates to computational resources available of query Figure 6 related the components of a problem. execution  context n  query d  optimizationns targetis∈ Problem Representation (UML Diagram)n Figure 6. Problem representation (UML diagram)e 19
    20. 20. available memory, and remaining energy, among others. Finally, the optimization objective indicates the resource or 2. Problem set of resources that will be optimized, e.g. minimize energy consumption. Figure 7 shows an example. Figure 9 Context - representsFigure 7. An example ofcomputational measure of the a problem resources instance sol available when the query is executed which is a The set of touples that represent the instance of context de- projection, Optimization Objectiveis: indicates{ the resource or set of data source picted on Figure 7 - Context = <memory, 400>, <CPU, resources75>, <energy, 70> } . Finally, the optimization objective that will be optimized consumed indicates the resource or resources from which their con- posed quer sumption must be optimized.20 Typically, optimization means { <memory,
    21. 21. minimize the utilization of these resources. According to o example, the optimization objective is minimize the memo 3. Case consumption specified by F(memory). 3) Case: A case is composed of a query, a solution (que plan) and a set of evaluation measures used to express t Specifies an optimized query, the solution query. Figure query and t optimization objective of a to solve the 8 illustrates the measures related to computational resources that were components of a case. consumed by the query execution query solution evaluation measures used to express optimization objective Case Representation (UML Diagram) Figure 8. Case representation (UML diagram) 21
    22. 22. imization target to a set of measures collected during the query execution.cribed as a set These measures are represented as couples of the form that represents ilable when the 3. Case <attribute, value> and express the computational resources (e.g. memory, CPU, or energy) consumed by the query de CPU charge, execution. Figure 9 shows an example. among others. the resource orminimize energy  Query - optimization target that hasof a case evaluated and solved Figure 9. An example beenm  Solution - physical execution plan that of this model. Such Figure 9 presents a simple instance solves the query instance solves the query Q by means of the query problem which is an ordered and pertinent sequence of selection,  Evaluationprojection, sort, and join collected during query ofce of context de- - set of measures operations for accessing a set execution, 400>, <CPU, data sources. The set of touples representing the resources 22zation objective consumed during the query evaluation applying the pro-
    23. 23. are solved. A case is the minimum unit of reasoning. It by tconsists of (i) a problem description, (ii) its correspondent Fsolution, and, (iii) aReasoning Process 4. set of annotations that describe how sentthe solution was derived. Case based reasoning has been theformalized as a four-step process: retrieve, Retrieval review and reuse,retain query class, query plan [7]. Retention * The * Get relevant cases using a similarity function and consumption measures * If there is no relevant case in are stored in form of a case the case base, a new query plan within the case base must be psuedo-randomly Retrieval generated to increase the query optimizer knowledge Retention Reuse Reuse * Adjust the solution of the Review relevant case to the * Execution plan is problem verified Review * The matching processes depends on the cases’ 23 similarity Figure 4. Case-based reasoning process
    24. 24. relevant case within the class must be retrieved by means Similarity Function of an intra-class similarity function [10][11]. When the most relevant case is retrieved, a detailed comparison between the clauses of the new query and the relevant query (the query Inter-class Similarity Function included by the relevant case) is carried out. This determines * used to define membership of a query a similarity level between the two queries. These functions are based on the contrast model of similarity proposed by Tversky [12] that allow us to determine Intra-class Similarity Function the similarity between two objects by means * used to retrieve most relevant case of a feature-matching function. Similarity increases as most common features and decreases as most distinctive Uses features [13]. The formalization of the original definition is a feature-matching function Similarity increases as most common features and decreases as expressed as follows [12]: most distinctive features (5) S (a, b) = θf(A ∩ B) - αf(A - B) - βf(B - A) Similarity between a and b, is defined in terms of the 24
    25. 25. ion families and as a decreasing function of distinctive families,go- in other words, families that pertain to one query but not theific Inter-class similarity other. The function can be applied to both classes, each oneing defined by a set of operation families, or applied to a query and a class. In this case, it is necessary to determine the  Increasing function of common operation familiesmp- operation families related to the involved operations. The  Decreasing function of distinctive families the formalization of this definition in terms of the similarity  Determine operation and a class is expressedinvolved operations between a query families related to the as follows: (6) S(C1 ,Q) = θ (C1 ∩ Q) - α (C1 -Q) - β (Q-C1 )ase-vantoperation families commonC C1 and Qis defined in terms of  Similarity between to and Q, 1her-features that pertain tocommon to C and Q, C ∩ Q, the  operation families C1 only 1 1  features that pertain to Q only on features that pertain to C1 but no to Q, C1 - Q, and thoseem. that pertain to Q but no to C1 , Q - C1 . The function f ase refers particularly to operation families . According to the ble purpose of our work, these are the features that must be the compared. the 25 For practical purposes, suppose that we know the class
    26. 26. ble of the query q and the definition of the classes c1 and c2 . wo purpose of our work, these are the features that must be the compared. tep the ost Inter-class similarity q For practical3 } c = { R.a1 , ∈ R.a2 ,weR.a3,R.a4 } class = {o1 , o2 , o purposes, suppose that know the ans of the{query ,q and the definition }of the classes c1 and c2 . c1 = R.a1 R.a2 , R.a3,R.a4 wo c2 = { R.a1 , R.a2 , x } ost tep the q = {o1 , o2 , o3 } c = { R.a1 , ∈ R.a2 , R.a3,R.a4 }most ery c1From R.a1 ,intersections between the query class c that = { the R.a2 , R.a3,R.a4 } ans describesR.a1 , query, q x } nes c2 = { the R.a2 and the classes c1 and c2 , it ismost possible to state that the query class c is similar to c1 . the Compute for intersections of C with C1 and C2  of ery From the intersections between the query class c that to nes describes the 1 ∩Q)={ and ,the classes c1 and c2 , it is S(c1 ,q) = (C query q R.a1 R.a2 , R.a3,R.a4 }ans possible = state∩Q)={ R.a1 , R.a2 } c is similar to c1 . S(c2 ,q) to (C 2 that the query class as of Query class C is similar to C1  ive to B. Intra-class 1 ∩Q)={ R.a1 , R.a2 , R.a3,R.a4 } S(c1 ,q) = (Csimilarityn is ans S(c2 ,q) = (C2 ∩Q)={ function aims to find the most similar Intra-class similarity R.a1 , R.a2 } as queries with respect to a new query, which is desired totive be Intra-class similarity same class. In this step, all the B. optimized, within then is compared queries are defined exactly to find the most similar Intra-class similarity function26aims by the same operations (operation type and involved attributes), the is desired to queries with respect to a new query, which difference is
    27. 27. queries with respect to a new query, which is desired to be optimized, within the same class. In this step, all the Intra-class Similarity compared queries are defined exactly by the same operations (operation type and involved attributes), the difference is he related to the comparison operators, as well as the attribute ain Aims to find the most similar two queries Q and to a is values. Similarity between queries with respect Q2 new query 1 no  All defined as an increasing function ofoperationsoperations compared queries have the same common ve  Comparison operators or attribute values may differ and (identical operations in terms of its type, attributes of operators). The formalization of this definition is as follows:  de Increasing function of common operationsies (7) S (Q1 , Q2 ) = θo(Q1 ∩ Q2 ) - αo(Q1 - Q2 ) - βo(Q1 - Q2 )ity  Operations that are common to Q1 and features that pertain to Q1 but not to Q2 !!"  Find the query that contains the maximum number of operation mappings! 27
    28. 28. two main modules, the case-reasoner and the execution planons in common, they differ in the operator generator. The case-base reasoner is in charge of adaptingjoin operation. Also, q1 and q2 have two Query Optimizer Architecture the solutions of similar queries to the new situation. The ex-mmon, they differ in the operator applied by ecution plan generator is in charge of generating new queryn. Finally, q1 and q5 have only one operation plans in a pseudo-aleatory way. The case-base reasoner isording to this analysis, q2 is the most similar the most complex of the two modules but the smartest, on ect to q1 because contains the maximum the other hand, the execution plan generator is simpler and  Reutilizes the solutions related to queries that does not been solvedtion mappings. q5 is the most different query probably faster; however it have apply machine learning cause it contains the minimum number of techniques. Figure 10 illustrates the optimizer architecture.ngs. Generates new has exactly the  On the other hand, q1 solutionsf mappings with q3 and q4 . How can we hese two queries is the most similar to q1 . A.vels Case-based Reasoner level 1. Smart queries indicates which between two Search Engine levant query must be adapted. This adapta-ormed 2. Adapter and Where clauses. just over Select lause, interesting attributes to be projected 3. Execution Managerhe Where clause, comparison operators or ted to the variablesBase Manager the 4. Case can be modified. OnFrom clause can not be changed because theried can not be changed. Table I illustrates arity levels. Here, selectClause is expresed B. Execution Plan Generator se as FC and whereClause as WC.n must be performed for the similarity levels). If the similarity level is (3) the From Figure 10. Optimizer architecturery clauses are equal, the adaptation must 28n the select clause, which means that the
    29. 29. Case-based Reasoner Adapts solutions of similar queries to the new situation1. Smart Search Engine • retrieves relevant cases • applies Inter and Intra-class Similarity functions • selects the query that minimizes the optimization parameters2. Adapter • adapts the query plan included in the relevant case to query problem specifications • used to facilitate and minimize the cost of the adaptation process 29
    30. 30. Case-based Reasoner3. Execution Engine • tests the new query execution plan created by the adaptation module4. Case Base Manager • allows to retain a new knowledge in form of a case • similarity function is also used 30
    31. 31. Execution Plan Generator 31