ppt slides


Published on

Published in: Technology, Spiritual
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Progressive Update for Enterprise Information Systems System update is pervasive on data instead of business logic “ Shutdown-restart” update is too expensive and, somewhere, impossible Collaboration of multi-version is necessary before the update finished, usually in evolution fashion instead of switching fashion for large and complex applications. Not only for IT but also for the users Smooth Data evolution for System Upgrade Performance is determined by the different data distribution in multi-version environment, one option is to maintain two versions and synchronize two databases for every transaction or one version (new or old). We propose hybrid with dynamic schema evolution Need to stabilize the system performance in the period of evolution Workflow Integration for Homogeneous Information Systems Schema mapping and matching can not solve the problem of data sharing ETL needs large cost and is not always fitful loose couple systems Data-outside Services in SaaS environment The data and applications can be separated (issues of data inside vs data outside). Also, there could be services for applications and services for data Data access could be a service for applications. We can take existing databases and applications and provide services. In such cases, the applications do not need to be aware of how the database schema is designed. SQL just describes what applications want independent from what the schema is actually implemented So, when there are multiple versions or multiple applications access to the same database system, we may automatically design the physical schema instead of multiple versions of schemas This holds true for progressive application evolution Improvement in performance, space, and applicability
  • Application upgrade during progressive update Database schema Old schema: So = {LineItem(k, qty)} New schema: Sn = {LineItemWithComment(k, qty, c) Query statistics The frequency of queries for old schema decreases step by step At the same time, that for new schema increases step by step Data distribution The differentiation of data in c column changes from one “null” value to large, step by step. Dilemma for the “shutdown-restart” schema evolution At the beginning period, a hybrid schema St = {LineItem(k, qty), Comment(k, comment)} is better than both of So and Sn ,since, It contains enough information for system to execute the queries of new version It keeps the relatively performance to execute the new queries since the join cost for St is not too much. While, At the ending period of evolution, schema Sn is better than both of So and St ,since, It keeps the relatively performance to execute the queries from both of versions How and when to evolve old schema to new schema is our right topic, The schema evolution should be considered both of old and new schema The cost of different schema should be calculated with variables below, Query Statistics Data distribution How to handle the constraints … .
  • Create table Need a special constraint p to define how to add the new column Online evolution, cost is zero It is exclusive evolution for some of “new queries”, which need the data of P Combine tables Need a reference f to determine how to combine the tables Offline evolution, evolution cost is calculated by table statistics Split table Build a new reference f to keep the constraint Offline evolution, evolution cost is calculated by table statistics
  • Version 1: Schema: T1, T2, T3 Queries: Q1  T1 × T2 × T3 and Q2  T1 × T2 Version 2: Schema: T4, T5, T6, T7 Queries: Q3  T4 ×T5 , Q4  T5 ×T6×T7 and Q5  T4 ×T6 The schema mappings shown as left, “m” should be a new column in new version We can calculate out, at least, 5 basic operators must be used in this evolution Then, we have several “intermediate schemas” in the migrating process
  • ppt slides

    1. 1. Support Multi-version Applications in SaaS via Progressive Schema Evolution WISS 2009 Shanghai, China March 29 th, 2009 Jianfeng Yan SAP Research Center - China Shanghai TEL: +86 6108 3896 www.sap.com/china/company/sapresearch/en/index.htm Draft Bo Zhang* University of Shanghai for Science and Technology TEL: +86 5527 1217 www.usst.edu.cn *Work done while with SAP
    2. 2. Issue in Multi-version Applications Hard to satisfy both Old and New customers <ul><ul><li>Progressive update for enterprise information Systems </li></ul></ul><ul><ul><li>Smooth data migration for system upgrade </li></ul></ul><ul><ul><li>Workflow integration for homogeneous information systems </li></ul></ul><ul><ul><li>Data-outside services in SaaS environment </li></ul></ul><ul><ul><li>Improvement in performance, space, and applicability </li></ul></ul>
    3. 3. Real Motivating Example What’s the impact on storage Old Sch. Key Qty 1,000,000 1,000 pages 1,000 rows / page New Sch. Key Qty Qty2 Comment 1,000,000 ~11,000 pages 90 rows / page Intermediate Sch. Key Qty 1,000,000 1,000 + 10 = 1,100 pages 1,000 rows / page Qty2 Comment 10,000 Key 100 rows / page
    4. 4. Begin to Evolve the Schema Snapshots from DB side Old version applications Time dimension DB for old version New version applications DB for new version Old system snapshot New system snapshot Mixed applications DB for evolution Adaptive query rewriter Progressive schema evolution, instead of “shutdown-restart” solution
    5. 5. Related Works Many but need more <ul><ul><li>Data partition </li></ul></ul><ul><ul><ul><li>Vertical DP is focus on the attribute level data partition, not on the table level </li></ul></ul></ul><ul><ul><ul><li>Horizontal DP is always under the assumption of unique schema </li></ul></ul></ul><ul><ul><ul><li>DP doesn’t discuss the performance in data evolution scenario </li></ul></ul></ul><ul><ul><li>Schema mapping and matching </li></ul></ul><ul><ul><ul><li>Feasible techniques are used to figure out the relationship of different schemas </li></ul></ul></ul><ul><ul><ul><li>Need to choose proper schema in changing data and query distribution </li></ul></ul></ul><ul><ul><li>ETL (Extract-Transform-Load) </li></ul></ul><ul><ul><ul><li>Most of existing ETL tools use offline technique and provide piece-meal usage </li></ul></ul></ul><ul><ul><ul><li>To the best of our knowledge, there is not ETL system for multi-version data evolution </li></ul></ul></ul><ul><ul><li>Also related to temporal DB </li></ul></ul><ul><ul><ul><li>We are dealing with multiple versions of schemas accessed by multiple versions of applications </li></ul></ul></ul>
    6. 6. Preparation for Schema Evolution Basic evolution operators <ul><ul><li>Create table </li></ul></ul><ul><ul><ul><li>Need a special constraint p to define how to add the new column </li></ul></ul></ul><ul><ul><ul><li>Online evolution, cost is zero </li></ul></ul></ul><ul><ul><ul><li>It is exclusive evolution for some of “new queries”, which need the data of P </li></ul></ul></ul><ul><ul><li>Combine tables </li></ul></ul><ul><ul><ul><li>Need a reference f to determine how to combine the tables </li></ul></ul></ul><ul><ul><ul><li>Offline evolution, evolution cost is calculated by table statistics </li></ul></ul></ul><ul><ul><li>Split table </li></ul></ul><ul><ul><ul><li>Build a new reference f to keep the constraint </li></ul></ul></ul><ul><ul><ul><li>Offline evolution, evolution cost is calculated by table statistics </li></ul></ul></ul>a b c a b c b p S: T1 T: t1 t2 a b a b c b c f S: T1 T2 T: t1 a b a b c b c f T: t1 S: T1 t2
    7. 7. Apply the Operators for Evolution An example <ul><ul><li>Such evolution process is not unique </li></ul></ul>T1 T2 T3 T4 T5 T6 T7 m T4 t1 T2 T6 t2 t3 3 m T4 t1 T2 T3 t3 2 T4 T5 T6 t2 t3 4 T4 t1 T2 T3 T1 T2 T3 1 T4 T5 T6 T7 5
    8. 8. Large Picture of Progressive Schema Evolution Erase applied operators Collect workload and data distribution Choose the best evolution strategy List the possible evolution strategies Data statistic Workload distribution Estimate the cost Schema update with Data loading process Schema mappings toward target Mapping calculation Basic evolution operators 一 二 三
    9. 9. Problem Modeling The search space is large In each partial evolution point, several basic operators can be chosen according to the cost of their target intermediate schema S’, S’’……. time S T Potential partial evolution Chosen partial evolution S’ S’’ S’’’ Evolution Point Evolution Point Evolution Point Evolution Point
    10. 10. Cost Estimation Models LAA and GAA <ul><ul><li>Local Adaptive Algorithm </li></ul></ul><ul><ul><ul><li>Optimize only for current step </li></ul></ul></ul><ul><ul><ul><li>c : Number of evolution points </li></ul></ul></ul><ul><ul><ul><li>n : Number of operators </li></ul></ul></ul><ul><ul><li>Global Adaptive Algorithm </li></ul></ul><ul><ul><ul><li>Look up to the target </li></ul></ul></ul><ul><ul><li>2 </li></ul></ul>n <ul><ul><li>c * 2 </li></ul></ul>n 1 2 1 0 3 3 Parent #1 3 3 1 2 2 1 Parent #2 Random cut #1 Random cut #2 3 2 1 0 2 1 Child Op1 Op2 Op3 Op4 Op5 Op6 Op1 Op2 Op3 Op4 Op5 Op6 Op1 Op2 Op3 Op4 Op5 Op6
    11. 11. Experiment Setting Running on real DBMS <ul><ul><li>MaxDB </li></ul></ul><ul><ul><ul><li>Used by SAP applications </li></ul></ul></ul><ul><ul><ul><li>Cost based, mostly focusing on page numbers to be loaded </li></ul></ul></ul><ul><ul><li>TPCW </li></ul></ul><ul><ul><ul><li>Book store </li></ul></ul></ul><ul><ul><ul><li>Benchmark for web applications </li></ul></ul></ul><ul><ul><ul><li>10 queries </li></ul></ul></ul><ul><ul><ul><ul><li>Old queries decreasing </li></ul></ul></ul></ul><ul><ul><ul><ul><li>New queries increasing </li></ul></ul></ul></ul>
    12. 12. Speed Up Analysis I/O cost comparison <ul><li>~ 40% performance gain when applying LAA </li></ul>
    13. 13. Speed Up Analysis LAA vs. GAA <ul><li>More than 50% performance gain when applying GAA instead of LAA </li></ul>
    14. 14. Conclusion and Future Works Exists unsolved problems <ul><ul><li>We found that, </li></ul></ul><ul><ul><ul><li>Problems exist for long-period application upgrade </li></ul></ul></ul><ul><ul><ul><li>Schema Evolution is one of solutions </li></ul></ul></ul><ul><ul><li>Continue work on, </li></ul></ul><ul><ul><ul><li>How to define the evolution points </li></ul></ul></ul><ul><ul><ul><li>Schema design for more general purpose </li></ul></ul></ul><ul><ul><ul><li>On-line evolution </li></ul></ul></ul>
    15. 15. Let’s Keep in Mind The EVOLUTION Thanks and ?