Your SlideShare is downloading. ×
  • Like
Aggreagate awareness
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Aggreagate awareness

  • 598 views
Published

 

Published in Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
598
On SlideShare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
3
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. MF-Retarget: Aggregate Awareness in Multiple Fact Table Schema Data Warehouses Karin Becker, Duncan Dubugras Ruiz, and Kellyne Santos Faculdade de Informática – Pontifícia Universidade Católica do Rio Grande do Sul http://www.inf.pucrs.br/{~kbecker | ~duncan} {kbecker, duncan} @inf.pucrs.br, kellyne@ufs.br Abstract. Performance is a critical issue in Data Warehouse systems (DWs), due to the large amounts of data manipulated, and the type of analysis performed. A common technique used to improve performance is the use of pre-computed aggregate data, but the use of aggregates must be transparent for DW users. In this work, we present MF-Retarget, a query retargeting mechanism that deals with both conventional star schemas and multiple fact table (MFT) schemas. This type of multidimensional schema is often used to implement a DW using distinct, but interrelated Data Marts. The paper presents the retargeting algorithm and initial performance tests.1 IntroductionData warehouses (DW) are analytical databases aimed at providing intuitive access toinformation useful for decision-making processes. A Data Mart (DM), often referredto as a subject-oriented DW, represents a subset of the DW, comprised of relevantdata for a particular business function (e.g. marketing, sales). DW/DM handle largevolumes of data, and they are often designed using a star schema, which containsrelatively few tables and well-defined join paths. On-line Analytical Processing(OLAP) systems are the predominant front-end tools used in DW environments,which typically explore this multidimensional data structure [3, 13]. OLAP operations(e.g. drill down, roll up, slice and dice) typically result in SQL queries in whichaggregation functions (e.g. SUM, COUNT) are applied to fact table attributes, usingdimension table attributes as grouping columns (group by clause). A multiple fact tables (MFT) schema is a variation of the star schema, in whichthere are several fact tables, necessary to represent unrelated facts, facts of differentgranularity, or even to improve performance [10]. A major use of MFT schemas is toimplement a DW through a set of distributed subject-oriented DMs [8, 10], preferablyrelated through a set of conformed dimensions [6], i.e. dimensions that have the samemeaning at every possible fact table to which it can be joined. In such architecture, amajor responsibility of the central DW design team is to establish, publish andenforce the conformed dimensions. However, these efforts of the design team are notenough to guarantee the ease combination, by end users, of facts coming from morethan one DM. Indeed, the straightforward join of facts and dimensions in MFTschemas imposes a number of restrictions, which are not always possible to beY. Manolopoulos and P. Návrat (Eds): ADBIS 2002, pp. 41-51, 2002.
  • 2. 42 Karin Becker et al.observed, otherwise, one risks to produce incorrect results. Most users do not have thetechnical skills for realising the involved subtleties and their implications in terms ofquery formulation. Therefore, for most users, queries involving MFT schemas aremore easily handled through appropriate interfaces or specific applications that hidefrom them all difficulties involved. In this paper, we propose MF-Retarget, a query retargeting mechanism that handlesMFT schemas and which is additionally aggregate aware. Indeed, precomputedaggregation is one of the most efficient performance strategies to solve queries in DWenvironments [8]. The retargeting service provides users with transparency from bothan aggregate retargeting perspective (aggregate unawareness) and multiple fact tables’schema complexity perspective, freeing users from query formulation idiosyncrasies.The algorithm is generic to work properly regardless the number of fact tablesinvolved in the query. The remainder of this paper is structured as follows. Section 2 presents relatedwork on the use of aggregates. The retargeting algorithm is described in Section 3,and Section 4 presents some initial performance tests. Conclusions and future workare addressed in Section 5.2 Related Work2.1 Computation of AggregatesIn a DW, most frequently users are interested in some level of summarisation oforiginal data. One of the most efficient strategies for handling this problem is the useof pre-computed aggregates for the combination of dimensions/dimension attributesproviding the greatest benefit for answering queries (e.g. frequent or expensivequeries) [4, 8,15, 16]. The computation of aggregates can be dynamic or static. In the former case, it is upto the OLAP tool or database engine to decide which aggregates are “beneficial”, aconcept that varies from tool to tool. Works such as [1, 2, 4, 5, 7, 14] address dynamiccomputation of aggregates. These approaches are different from the static context inthat not only the cost of executing the query is considered, but alsomaintenance/reorganisation costs which takes place as query is processed [2, 5]. Inthe static context, aggregates are created off-line, and therefore maintenance/reorganisation costs are not that critical. It should be clear that dynamic and staticaggregate computations are complementary mechanisms. The first addressesperformance tuning from a technical perspective. The latter, addressed in this paper, isessential from a corporate point of view. Organisationally, static aggregate computation is fundamental because aggregatesare created based on corporate decisional requirements, prioritising types of analysisor types of users. Of course, decisional support requirements vary overtime, so it isfundamental that the DBA monitors the use of the analytical database in order torevise the necessity of existing and/or new aggregates. Design alternatives for representing aggregates are extensively discussed inpragmatic literature such as [6, 10]. Storing each aggregate in its own fact table
  • 3. MF-Retarget: Aggregate Awareness in Multiple Fact Table Schema Data Warehouses 43presents many advantages in terms of easiness of manipulation, maintenance,performance and storage requirements. Aggregation also leads to smallerrepresentations of dimensions, commonly referred to as shrunken dimensions.Aggregates should, whenever possible, refer to shrunken dimensions, instead oforiginal dimensions. A shrunken dimension is commonly stored in a separate tablewith its own primary key. User tools or applications should not reference the aggregate to be used in SQLqueries. First, it must be possible to include/remove aggregates without affectingusers or existing applications. Second, users cannot be in charge of performanceimprovement by the selection of the appropriate aggregate.2.2 Aggregate Retargeting ServicesThere are three major options where query-retargeting services can be located in theDW architecture: the desktop, the OLAP server or the database engine [16]. Thequery retargeting service can also be located in between these layers, in case noaccess to the DBMS engine/OLAP tool source code is provided. Most works in theliterature (e.g. [1, 2, 3, 4, 5, 7, 14]) focus on dynamic computation of aggregates,considering strategies that are embedded in query processors, such that the retargetingservice can change completely the query execution plan. Dynamic aggregation alsoconsiders a specific moment of user analysis (e.g. a sequence of related drills), andnot the organisational requirements as a whole. Kimball et al. [6] sketch a query-retargeting algorithm for statically pre-computedaggregates, which could be inserted as a layer between front-end toll andOLAP/server DBMS engine. The algorithm is based on the concept of “family ofschemas”, composed of one base fact table and all of its related aggregate tables. Oneof the advantages of such algorithm is that it requires very few metadata, basically thesize of each fact table and the available attributes for each aggregate. In this paper, weextend this algorithm to deal with MFT schemas. Such an extension is useful for DWarchitectures implemented by a set of subject-oriented set of DMs, in which userswishes to performed both separate and integrated analysis.3 MF-RetargetThe striking feature of MF-Retarget is its ability to handle MFT schemas with the useof aggregates, yet providing total transparency for users. The joining of several facttables requires in the general case that each individual table must be summarisedindividually first until all tables are in the same summarising level (exactly the samedimensions), and then joined. However, most users do not have the technical skills forrealising the problems involved, nor the requirements in terms of query formulation.See [11] for a deeper discussion on the subtleties involved. Additionally, it shouldbenefit from the use of aggregates as a query performance tuning mechanism. Hence,transparency in the context of MFT schemas considered must have a twofoldmeaning: a) aggregate unawareness, and b) MFT join complexity unawareness.
  • 4. 44 Karin Becker et al. MF-Retarget is a retargeting service intended to lie between the front-end tool andthe DBMS, which accepts as input a query written by a user through a proper userinterface (e.g. a graphical one intended for naive users, a specific application). Thealgorithm assumes that: − Users are unaware of MFT joining complexities, and always write a single query in terms of desired facts and dimensions. The retargeting service is responsible for rewriting the query to produce correct results, assuming as a premise that it is always necessary to bring each fact table to the same summarisation level before joining them. − Users are unaware of the existence of aggregates, and always formulate the query in terms of the original base tables and dimensions. The retargeting service is responsible for rewriting the query in terms of aggregates, if possible. − Retargeting queries involving a single fact table is a special case of MFT schemas, and therefore, the algorithm should provide good results in both cases. The remainder of this section addresses an illustration scenario and describes thealgorithm. Further details on the algorithm, the required metadata and MF_Retargetprototype can be obtained in [11].3.3 Algorithm IllustrationTo illustrate the functioning of the algorithm, let us consider the example depicted inFigure 1, a simplification of the MFT schema proposed in the APB-1 OLAP Councilbenchmark [9]. For each fact table the fields prefixed by * compose its primary key.In dimension tables, only one field, the lower one in the hierarchy, composes itsprimary key (also prefixed by *). The branches show the referential integrity from afact or aggregate table for each of its dimensions. The MFT schema of Figure 1(a)shows two fact tables (Sales and Inventory), related by three conformed dimensions:Customer, TimeDim and Product. Sales have an additional dimension, namelyChannel. Also, Figure 1(a) shows some possible aggregates for this schema. In thepicture, grey boxes correspond to shrunken dimensions, i.e. hierarchic dimensionswithout one or more lower-level fields. For example, consider a user wishes to analyse comparatively quarterly Sales ofproduct divisions, with the corresponding status of the Inventory. This query cannotbe answered simply by joining facts of distinct tables, because these facts representinformation on different granularity, and therefore, they should be brought to thesame summarisation level before they can be joined, otherwise inaccurate results willbe produced. To free the user from the difficulties involved in MFT schemas, MF-Retarget assumes the user states a single query in terms of facts, dimensions anddesired aggregation level (the input shown in Figure 2, for the example considered).The retargeting mechanism has then two goals: to correct the query, and try to make itmore efficient with the use of aggregates. Considering the aggregates illustrated in Figure 1(a), the algorithm realises thatAggregate4 is the best candidate to answer the question, because it contains allnecessary data, is the smaller one, and already joins Sales and Inventory tables (inthat order). In the absence of Aggregate4, Aggregate1 and Aggregate2 will be used.If the algorithm does not find any aggregate that can answer the query in a more
  • 5. MF-Retarget: Aggregate Awareness in Multiple Fact Table Schema Data Warehouses 45 TimeDim Year Sales Quarter Inventory Customer *Month *Cust_ID *Cust_ID Channel *Prod_ID *Prod_ID Retailer *Chan_ID *Store *Time_ID *Base *Time_ID Product StockUnits UnitsSold DollarSales Division Line Family Group Class Aggregate3 *Code *Prod_ID *Time_ID Aggregate1 StockUnits *Cust_ID Sh_Customer Aggregate2 *Prod_ID *Chan_ID *Retailer *Cust_ID Sh_TimeDim *Time_ID *Prod_ID UnitsSold *Time_ID Year DollarSales *Quarter StockUnits (a) Sales Inventory Sh_Product Aggregate4 Division *Prod_ID Aggregate1 Aggregate2 Aggregate3 *Line *Time_ID UnitsSold DollarSales Aggregate4 StockUnits (b) Fig. 1. MFT schema and possible aggregates, and schema derivation graphefficient way, at least it transforms the query to produce correct results. Figure 3shows the results from the algorithm for these three situations. It should be pointed out that the best aggregate is not always the one that alreadyjoins distinct fact tables. Indeed, in case smaller individual aggregates exist, the costof joining them can be smaller than the cost of summarising a much bigger joined pre-computed aggregate.3.4 The AlgorithmThe algorithm assumes that users have to inform only the tables (fact/dimensions), thegrouping columns (which are the same ones listed in the select clause), thesummarisation functions applied to the measurements, and possibly additionalrestrictions in the where clause. It considers the following restrictions to inputqueries: a) monoblock queries (select from where group by); b) only transitiveaggregation functions are used; c) all dimensions listed in the from clause apply to allfact tables listed. For the algorithm, the relationship between schemas is represented by a directedacyclic graph G(V, E). In the graph, V represents a set of star schemas, and Ecorresponds to set of derivation relationships between any two schemas. The edges ofE form derivation paths, meaning that schema at the end of any path could be derivedby the aggregation of schema related to the start of that path. The use of graphstructures for representing relationships between aggregates is well known [4, 13].Figure 1(b) presents the derivation graph for the example of Figure 1(a). We assumethat only transitive aggregation functions (i.e. SUM, MAX and MIN) are used in boththe derivation relationships and queries.
  • 6. 46 Karin Becker et al. “The units sold and units in stock, per quarter and product division”Select P.Division Division, T.Quarter Quarter, SUM(S.UnitsSold) UnitsSold,SUM(I.StockUnits) StockUnits From TimeDim T, Product P, Sales S, Inventory IWhere T.Month=S.Time_ID and P.Code=S.Prod_ID and T.Month=I.Time_ID and P.Code=I.Prod_ID Group by P.Division, T.Quarter Fig. 2. Input SQL query from a naive DW usera) considering the existence of Aggregate4:Select P.Division Division, T.Quarter Quarter, SUM (UnitsSold) UnitsSold,SUM (StockUnits) StockUnits From Sh_TimeDim T, Sh_Product P, Aggregate4 A4Where T.Quarter=A4.Time_ID and P.Line=A4.Prod_ID Group by P.Division, T.Quarterb) in the absence of Aggregate4:Create view V1 (Division, Quarter, UnitsSold) as Select P.Division, T.Quarter,SUM (S.UnitsSold) From Sh_TimeDim T, Product P, Aggregate1 A1Where T.Quarter=A1.Time_ID and P.Code=A1.Prod_ID Group by P.Division, T.QuarterCreate view V2 (Division, Quarter, StockUnits) as Select P.Division, T.Quarter,SUM (I.StockUnits) From Sh_TimeDim T, Product P, Aggregate2 A2Where T.Quarter=A2.Time_ID and P.Code=A2.Prod_ID Group by P.Division, T.QuarterSelect V1.Division Division, V1.Quarter Quarter, UnitsSold, StockUnitsFrom V1, V2 Where V1.Division = V2.Division and V1.Quarter = V2.Quarterc) if no aggregates are found:Create view V1 (Division, Quarter, UnitsSold) asSelect P.Division, T.Quarter, SUM (S.UnitsSold) From TimeDim T, Product P, Sales SWhere T.Month=S.Time_ID and P.Code=S.Prod_ID Group by P.Division, T.QuarterCreate view V2 (Division, Quarter, StockUnits) as Select P.Division, T.Quarter,SUM (I.StockUnits) From TimeDim T, Product P, Inventory IWhere T.Month=I.Time_ID and P.Code= I.Prod_ID Group by P.Division, T.QuarterSelect V1.Division Division, V1.Quarter Quarter, UnitsSold, StockUnitsFrom V1, V2 Where V1.Division = V2.Division and V1.Quarter = V2.Quarter Fig. 3. Possible outputs of the algorithm The algorithm is divided into 4 steps, which for clarity purposes are individuallypresented and illustrated using the example of Section 3.1: 1. Divide the original query into component queries; 2. For each component query, select candidate schema(s) for answering the query; 3. Select best candidates. 4. Rewrite the queryStep 1: Division into Component Queries.For each fact table Fi listed in the from clause of the original query Q, a componentquery Ci (i>0) is created, according to the following algorithm:1. For each fact table Fi listed in the from clause of Q, create a component query Ci such that: 1.1. Ci from clause := Fi and all dimensions listed in the from clause of Q; 1.2. Ci where clause := all join conditions of Q necessary to relate Fi to the dimensions, together with any additional conditions involving these dimensions or Fi; 1.3. Ci group by clause := all attributes used in the group by clause of Q; 1.4. Ci select clause := all attributes used in the group by clause of Q, in addition to all aggregation function(s) applied to Fi attributes.
  • 7. MF-Retarget: Aggregate Awareness in Multiple Fact Table Schema Data Warehouses 47Notice that a query referring a single fact table is treated as a special case of queriesinvolving several fact tables. In that case, Step 1 produces a single component querythat is equal to the original query Q. Figure 4 shows the component queries createdfor the query input illustrated in Figure 2.C1 -> Select P.Division, T.Quarter, SUM (S.UnitsSold) UnitsSoldFrom TimeDim T, Product P, Sales SWhere T.Month=S.Time_ID and P.Code=S.Prod_ID Group by P.Division, T.QuarterC2 -> Select P.Division, T.Quarter, SUM (I.StockUnits) StockUnitsFrom TimeDim T, Product P, Inventory IWhere T.Month=I.Time_ID and P.Code=I.Prod_ID Group by P.Division, T.Quarter Fig. 4. Component queries for input of Figure 2Step 2: Candidates for Component QueriesThis step generates for each component query Ci (i>0) resulting from Step 1 therespective candidate set CSi. Each candidate belonging to CSi is a schema (base oraggregate) that answers Ci.2 For each component query Ci, generated in Step 1: 2.1. Let n:= the node that corresponds to the base schema of Ci; mark n as “visited”; let CSi := n; 2.2. Using a depth-first traversing, examine all schemas derived from n until all nodes that can be reach from it are marked as “visited”; 2.2.1. Let n := next node; mark n as “visited”; 2.2.2. If all query attributes (select and where clauses) of Ci belong to schema n and each aggregation function of the select clause of Ci is exactly the same of one used in the fact table of schema n Then CSi := CSi ∪ n; Else Mark all nodes that can be reached from n as “visited”. Each time this step is executed for a component query, the graph is traversed usinga depth-first algorithm starting from the corresponding base schema. When thealgorithm detects that a schema cannot answer the component query, all schemas atthe end of a derivation path starting from it are disregarded. Each CSi element is avalid candidate to replace Fi in the from clause of Q. Therefore, every tuple (e1, …,en), where e1 ∈ CS1, …, en ∈ CSn, are valid combinations for the rewrite of Q.Considering the graph depicted in Figure 1(b), and the component queries of Figure 4: − CS1 = {Sales, Aggregate1, Aggregate4} for component query C1; − CS2 = {Inventory, Aggregate2, Aggregate3, Aggregate4} for C2.Step 3: Selection of Best CandidatesLet T be a set of tuples (e1, …, en), where e1 ∈ CS1, …, en ∈ CSn, (n>0), representingthe Cartesian Product of candidate sets CS1 X .. X CSn. Let t be a tuple of T. Thepresent version of the algorithm bases this choice on the concept of accumulated sizeto choose the best candidate. The accumulated size of t(e1, …, en), AS(t), is a functionthat returns the sum of records that must be handled if the query were rewritten usingt. For summing the number of records, AS(t) computes only once the size of a given
  • 8. 48 Karin Becker et al.table, in case it is included in more than one candidate set CSi. Thus, if Aggregate4 ischosen only its records need to be processed, and only once. In all other cases, recordsfrom the different fact tables in t are processed, considering each table only once. This may suggest that, in a multi-fact query, the best t will ever be the one wheree1 = …= en. However, this is not true. Indeed, the cost of processing more records(I/O cost) has a stronger impact than the cost of joining tables. Notice that this stepcan be improved in many ways, by varying the cost function used to prioritise thecandidates for query rewrite. An immediate improvement of this function is theconsideration of index information to be combined with table size.3 Consider all CSi sets generated in Step 2 and T, the Cartesian Product of candidate sets CS1 X .. X CSn 3.1. t := t(e1, …, en) ∈ T with the smallest accumulated AS(t), considering all t(e1, …, en) ∈ T, where e1 ∈ CS1, …, en ∈ CSn;Step 4: Query ReformulationOnce the best candidate for each component query is determined, the query isrewritten. If the set of best candidates resulting from Step 3 has a single element, i.e.a common aggregate for all component queries, a single query is written using suchaggregate and respective shrunken dimensions. This is the case for our example,where Figure 3(a) displays the rewritten query. Otherwise, the query is rewritten interms of views that summarise the best aggregates individually, and then join them(e.g. as in Figure 3(b) and (c)). If there is a common best candidate that answers morethan one component query, but not all of them, a single view is created for that set ofcomponent queries. This trivial algorithm is not presented due to space limitations.4 TestsInitial performance tests were performed based on the MFT schema presented in theAPB-1 OLAP Benchmark [9], which comprises 4 fact tables. For the tests, we usedInventory, Sales and corresponding dimensions, as depicted in Figure 1(a). Theaggregates were defined to experiment performance under different levels ofaggregation (compression factor). We did not use the semantics of the aggregates, noruser requirements expressed in the benchmark (e.g. queries). We also disregarded thenumber of records of the resulting database for aggregate selection. Two tests wereexecuted, referred to Test1 and Test2, described in the remaining of this section.4.1 Test1The goal of Test1 was to verify whether the algorithm performed well and correctly,considering both star and MFT schemas. APB-1 program was executed withparameters 10, 0.1 and 10, which resulted in a Sales table with 1,239,300 records,whereas Inventory comprised 270,000 records. We executed five queries, three ofthem involving a single fact table (Sales), and two of them the join of both fact tables.For each query, we calculated the number of records of the resulting table, and theprocessing time considering all possible alternatives.
  • 9. MF-Retarget: Aggregate Awareness in Multiple Fact Table Schema Data Warehouses 49 It was possible to verify that the algorithm always chose the aggregate with thesmallest processing time, regardless whether the query involved a single fact table ora join of multiple fact tables. Proportionally, there was a significant gain in the vastmajority of cases, but absolute gains were not always significant. The magnitude ofperformance gain seems to be a function of the (fact) table size, aggregatecompression factor, and output table size.4.2 Test2Test2 was executed running APB-1 with Inventory Salesparameters 10, 1 and 10. The derivationgraph for this test is depicted in Figure 5, I1 S1and Table 1 describes the properties of theschemas: number of records, compressionfactor (CF) with regard to both the base I2S2fact table and its deriving schema(s), and I3 S3the difference between the derived/deriving schemas in terms of dimensions. Fig. 5. Derivation graph used in Test2 We executed a single query thatinvolved facts from both Sales and Inventory tables, and which could be answered by(the combination of) all aggregates. The goal was to compare the respective absoluteprocessing times. Table 2 displays in the first row the elapsed time measured for eachexecution. Subsequent rows show the gains in using an aggregate with regard tobigger alternatives. For instance, the use of aggregate I2S2 represents a gain of 52%with respect to the join of I1 and S1, calculated as (time(I1 and S1) –time(I2S2))/time(I1 and S1), and 96% with regard to the join of Inventory andSales. It is possible to verify that the gains considering absolute times are verysignificant this time. The use of the accumulated size AS(t) as the main criterion toprioritise aggregate candidates seems to be simple but efficient, although it still can beimproved in many ways, particularly indexing.5 ConclusionsIn this work we presented MF-Retarget, a retargeting mechanism that deals with bothMFT schemas and statically computed aggregates. The algorithm provides two typesof transparency: a) aggregate unawareness, and b) users are spared from the complexities of queries in MFT schemas. This retargeting service is intended to be implemented as a layer between userfront-end tool and DBMS engine. Thus, it can be complementary to gains alreadyprovided by OLAP tools/DBMS engines in the context of dynamic computation ofaggregates. Further details on the implementation can be obtained in [11].
  • 10. 50 Karin Becker et al. Tab. 1. Inventory/Sales derivation graph description Table or Records CF (base Deriving CF(Deriving Shrunken Eliminated Aggregate schema) schema schema) Dims. dims. Sales (S) 13,122,000 S1 2,400,948 18.3 % Sales 18.3 % Yes Yes S3 614,520 4.7 % S1 25.6% Yes Inventory (I) 12,396,150 I1 2,496,762 20.1% Inventory 20.1% Yes I3 631,800 5.1 % I1 25.3 % Yes I2S2 2,400,948 18.3 % (S) S1 and I1 96.2% I1 19.4 % (I) 100 % S1 Tab. 2. Results with larger tables Query1: 614,520 rec. Inventory and Sales I1 and S1 I2S2 I3 and S3 Time (hours:min:sec) 20:35:43 1:27:04 0:42:09 0:15:22 Inventory/Sales (%) 93 96 99 I1 and S1 (%) 52 82 I2S2 (%) 64 AS(t) 25,518,150 4,897,710 2,400,948 1,246,320 Preliminary tests confirmed the algorithm always provided the best response time.Proportional gains are always significant, but absolute gains increase with bigger facttables. It is obvious that additional tests are required to determine precise directivesfor the construction of aggregates in MFT schemas, and under which circumstancesthe processing gains are significant. It is also important to refine our criteria forselecting the best candidate. It is also important to use indexing information inaddition to table number of records for aggregate selection. Future work also includes, among other topics, the better definition of functioncosts for prioritising candidate aggregates, use of indexes in the function cost, theintegration of the retargeting mechanism into a DW architecture, support foraggregates monitoring and recommendation for aggregates reorganisation, the use ofthe proposed algorithm in the context of dynamic aggregate computation.AcknowledgementsThis work was partially financed by FAPERGS, Brazil.References1. Baralis, E., Paraboshi, S., Teniente, E. Materialized Views Selection in a Multidimensional Database. Proceedings of the VLDB’97 (1997). 156-165.2. Chaudhuri, S., Shim, K. An Overview of Cost-Based Optimization of Queries with Aggregates. Bulletin of TCDE (IEEE), v. 18 n. 3 (1995). 3-9.
  • 11. MF-Retarget: Aggregate Awareness in Multiple Fact Table Schema Data Warehouses 513. Gray, J., Chaudhuri, S. et al. Data cube: a relational aggregation operator generalizing Group-by, Cross-tab and Subtotals. Data Mining and Knowledge Discovery, v. 1, n. 1 (1997) 29-53.4. Gupta, A., Harinarayan, V., Quass, D. Aggregate-query Processing in Data Warehousing Environments. In: Proceedings of the VLDB’95 (1995). 358-369.5. Gupta, H.; Mumick, I. Selection of views to materialize under a maintenance cost constraint. Proceedings of the ICDT (1999). 453-470.6. Kimball, R. et al. The Data Warehouse Lifecycle Toolkit : expert methods for designing, developing, and deploying data warehouses. John Wiley & Sons, (1998)7. Kotodis, Y., Roussopoulos, N. Dynamat : A Dynamic View Management System for Data Warehouse. Proceedings of the ACM SIGMOD 1999. (1999) 371-3828. Meredith, M., Khader, A. Divide and Aggregate: designing large warehouses. Database Programming & Design. June, (1996). 24-30.9. OLAP Council. APB-1 OLAP Benchmark (Release II). Online. Captured in May 2000. Available at : http://www.olapcouncil.org/research/bmarkco.htm, Nov. 1998.10.Poe, V., Klauer, P., Brobst, S. Building a Data Warehouse for Decision Support 2nd edition. Prentice Hall (1998).11.Santos, K. MF-Retarget: a multiple-fact table aggregate retargeting mechanism. MSc. Dissertation. Faculdade de Informatica - PUCRS, Brazil. (2000). (in Portuguese)12.Sapia, C. On modeling and Predicting Query Behavior in OLAP Systems. Proceedings of the DMDW’99. (1999)13.Sarawagi, S. Agrawal, R. Gupta, A. On Computing the Data Cube. IBM Almaden Research Center : Technical Report. San Jose, CA (1996).14.Srivasta, D., et al. Answering Queries with Aggregation Using Views. Proceedings of the VLDB’96 (1996). 318-32915.Wekind, H., et al. Preaggregation in Multidimensional Data Warehouse Environments. Proceedings of the ISDSS97 (1997). 581-5916.Winter, R. Be Aggregate Aware. Intelligent Enterprise Magazine, v. 2 n. 13 (1999).