SQL Rewriting Engine and its Applications

                     Joshwini Pereira   Tzi-cker Chiueh
Query level backup. This is because          returns to the user the execution
the backup is carried out on a query       ...
The Repairable DBMS [2] as it is      implemented for PostgreSQL, Oracle
known, is basically a proxy between         and S...
supporting              tables              etc.          SQLRewtStatement integrates these
SQLRewtStatement modules also differ         is reading a set of rows that were
in the manner in which the statements     ...
Trans_id Integer, TS   table.
                                   Timestamp, IS_DEL                Row Ids from the backup ...
stored in every <attribute_name>_DEP
      In the above scenario, the Id of T1    field.
will be recorded as the modifying...
where a transaction could have led to a     The result sets differ if the rows
different outcome if some previous         ...
of the malicious transaction      Row Id          A1          A2            A3
         and hence map to the ‘Current     ...
a total of 19000 read/write transactions
5    Performance Evaluation                 and 5000 read intensive transactions....
based dependency tracking along with
query level backup. The reason for this
stems from the fact that both
Dependency Trac...
Architecture (MTDS) offers three             additional pointer that points to a table
levels of isolation – Separate     ...
can be carried out for both Pre             8    References
allocated fields and Name Value Pairs
depending on the functio...
Upcoming SlideShare
Loading in …5

"A General SQL-Rewriting Engine,"


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

"A General SQL-Rewriting Engine,"

  1. 1. SQL Rewriting Engine and its Applications Joshwini Pereira Tzi-cker Chiueh Computer Science Department Stony Brook University Abstract stable state involves discarding valid transactions along with the anomalous ones. In order to carry out a selective There have been numerous efforts undo, Dependency Tracking is directed towards improving the provided. This functionality helps to durability of databases; various track the interdependence of applications such as ‘Dependency transactions and helps to isolate the Tracking’, ‘Query Level Database malicious transactions from the Backup’ and ‘Query Level Restore’ legitimate ones. have been built which provide novel Dependency Tracking [1] ways for maintaining the consistency identifies two types of operations – of the database. All these applications DDL and DML. Any transaction have a common trait between them – involving creation or alteration of the they aim at rewriting the query issued tables are grouped into DDL by the user. SQL rewriting engine is operations and all other operations built with a goal to integrate the such as Select, Delete, Update, Insert common framework between the three fall into the category of DML applications and provide a user level operations. Dependency is tracked by API which offers all the functionalities obtaining the Ids of the transactions of ‘Dependency Tracking’, ‘Query that had previously modified the read Level Database Backup’ and the set of the current transaction where the ‘Query Level Restoration read set are the rows being read or Functionality’. The Engine also modified(DML operations) by the provides additional features such as current transaction. From this ‘Phantom Dependency Tracking’ that implication, the current transaction is can help determine the phantom said to ‘depend’ on the other dependencies apart from the regular transactions that had previously dependencies identified. In this paper, modified the read set of that we describe the design and transaction. implementation of the SQL Rewriting Dependency Tracking does not Engine and also provide the always yield accurate results and can performance evaluation results run on lead to identifying false positives [2]. the engine. The Engine has been built Thus Dependency Tracking has two using .NET framework and is separate implementations – Row Based implemented specifically for Oracle Dependency Tracking and Column 10g as the backend database. Based Dependency Tracking where the Thorough evaluations have been latter provides a finer granularity for carried out and detailed description of tracking the dependencies and the results obtained has been eliminates false positives. explained. In order to simplify the restoration process, during any DML operation 1 Introduction usually Update and Delete, the rows being modified by the transaction are Databases are corrupted either by backed up before any operation is the actions of a malicious intruder or carried out on them. This is termed as due to a common human error. the pre image of the rows being backed Restoration of the database to a prior up and is specifically addressed as
  2. 2. Query level backup. This is because returns to the user the execution the backup is carried out on a query sequence for the rewritten queries. This level basis rather than storing the entire ordering helps the user to understand snapshot of the database. the exact sequence of events occurring After the detection of database at the backend and also gives a clear corruption, the reconstruction process picture about the state of the database. involves restoring the database to a The engine also provides the user with prior stable state. This is made possible the option of executing the set of through a Query Level Restore process rewritten queries on the backend where the restoration of the database is database, further simplifying the user’s carried out in the reverse temporal effort. order, with only the modified rows The Engine also provides an being updated based on their pre enhanced feature of Phantom images stored by the Query Level Dependency Tracking that can be used Backup process. The row Id of the row to determine the transactions that have being restored along with the phantom dependencies and that need to timestamp specified forms a unique be undone in the face of database identifier that helps to restore the corruption. This auxiliary component database to a specific point in time. offers additional support for Once a set of malicious identifying and recovering from transactions are determined and before failures caused by the erroneous any subsequent restore process can be transactions. carried out to undo these specific transactions, all the transactions with 2 Related Work dependencies on the malicious transactions need to be identified. Apart from the direct dependencies, the Extensive research has previously transactions that have phantom been carried out with reference to dependencies on the initial Undo Set Dependency Tracking among also need to be determined. Phantom transactions. The ideas expressed in the Dependency Tracking determines the papers - A Portable Implementation phantom dependencies in addition to Framework for Intrusion-Resilient the inter transaction dependencies and Database Management Systems [1] helps in identifying the correct final and Accurate Inter-Transaction Undo set. Dependency Tracking for Repairable The SQL rewriting engine is built DBMS [2] by Chiueh and Smirnov, as an integral whole of three provide lucid algorithms and solutions applications – Dependency Tracking, for accurate inter dependency tracking Query level backup and Restoration among transactions. Procedure. The common factor The motivation behind between the applications is that they Dependency Tracking was to rewrite the query issued by the user to determine the relationship among the suit their individual functionalities. transactions throughout their execution The rewriting engine has an inbuilt sequence. This information obtained, parser that parses the incoming SQL helps in identifying the exact set of queries and provides a suitable format dependencies shared among the for the parsed statements that can be transactions and helps in isolating the easily utilized by the applications. The different dependencies observed. This engine offers a wide array of user level proves to be very useful in the case of APIs which provides the user the database corruption, where transactions flexibility of choosing any type of affected by the actions of any functionality desired and also offers a erroneous transaction can be identified choice of using all the three and selectively rolled back. This is a functionalities as an integrated whole. useful technique as we do not have to Thus the SQL rewriting engine roll back the entire database, but rather rewrites the incoming queries and just a subset of erroneous transactions.
  3. 3. The Repairable DBMS [2] as it is implemented for PostgreSQL, Oracle known, is basically a proxy between and Sybase databases. Finally, the application accessing the backend performance evaluations showed that and the database being accessed. The the proxy added an overhead of nearly proxy intercepts the outgoing queries 6%-13% during runtime. from the application, transforms them based on the type of SQL query and 3 SQL Rewriting Engine redirects it to the backend. The transformation carried out on the query Design helps to maintain additional information pertaining to dependency 3.1 Architecture tracking on the backend. The dependency information tracked by this tool helps the database The Engine is built as a client side tool where it offers a rich set of APIs administrators in determining the exact set of transactions to be undone. With to the developer designing the application. Based on the functionality this method, the actions of valid transactions can be preserved and thus desired, specific APIs can be utilized to rewrite the outgoing SQL queries. need not be undone in face of database failure. The engine offers the developer the flexibility of either obtaining just the The prototype built, explains three separate implementations of rewritten queries or allows for the rewritten queries to be executed on the dependency tracking – Row based, Column based and Selective Column backend. This allows the developer to choose from varied features and helps based tracking. Column based tracking provides for a finer granularity in terms to incorporate specific functions into the application thus making it as of tracking but at the same time accounts for higher overhead. Selective flexible as possible in designing any application. Column based tracking is an improvement on the previous method The engine is designed as three distinct sections – Connection, where a pre determined column set is provided and tracking is carried out Common and Parsing. The Connection module sets up and maintains the only with respect to these specific attributes. This leads to a lower connection to the backend database. It also stores the connection information overhead compared to Complete Column based tracking. Row based in the Common module from which the other sub modules inherit the tracking allows for coarser level of tracking and may lead to the connection details. The Common module contains identification of false positives. But, Row based tracking has a lower information that is common to all the sub-modules such as the connection overhead compared to the above mentioned methods as the amount of parameters, functions to set up the tracking information to be maintained is relatively lower. The RDB also provides for detection of ‘phantoms’ where transactions do not share direct dependencies but rather could have had a different effect on the database if not for certain erroneous transactions. The framework allows for successful detection of these phantoms and provides the administrator with this additional information. The RDB framework is built for portability and was successfully
  4. 4. supporting tables etc. SQLRewtStatement integrates these functionalities and provides the user Client Machine Server Machine a common interface to manipulate with both the features. 1. Redirect Rewritten Query This distinction was made so as SQL to provide small improvements to the Rewriting working of the engine. The engine Engine requires a set of tables to store the Database metadata pertaining to the tables and Issue 2.1 the transactions and this number can Query vary based on the features desired. In case of Statement, three tables Client need to be set up – Transactions, Application DDL_Dep and Trans_Dep. 2.2 Query database with rewritten command • Transactions (Trans_ID, TS) Figure 1: SQL Rewriting Engine resides on the client side, stores information regarding where it either [1] transforms and redirects the query to the backend OR [2.1] returns the rewritten queries to the user the transactions such as the Id AND [2.2] is subsequently executed on the server by the and the time stamp when the user. transaction was initiated. • Trans_Dep (Trans_ID, Dep_ID) stores the Ids of the The Parsing module functions as transactions that a particular the parser and is built as a wrapper to transaction depends upon. the DLLs provided by the General SQL Parser [3]. It uses the primitive • DDL_Dep(Trans_ID, Table_Name, Type, TS) keeps functions provided by the GSQL track of the transactions that Parser to identify the type of the carry out DDL operations and queries, obtain the various parts of the records the corresponding statement such as table name, the table and the type of operation where clause etc and finally stores the (Create/ Alter) carried out. query in a format which is compatible with the functioning of the rewriting The Transactions table is engine. Additional functions such as common to both the dependency parsing of the Alter Table command tracking and the query level backup and obtaining the parameter list in a modules but Trans_Dep and Create Table command had to be DDL_Dep are the tables needed by developed as they were not supported dependency tracking alone. Thus by the general SQL parser. depending on the type of functions The Connection, Common and invoked the corresponding tables are Parsing modules provide auxiliary created. functions to the core components of the In the case of Engine – Statement and SQLRewtStatement only two kinds of SQLRewtStatement from where the tables are created – Transactions Dependency Tracking, Query Level (Trans_ID, TS, Dep_ID) and Backup and Restore Modules branch DDL_Dep (Trans_ID, Table_Name, out. Type, Ts). This is because both The Statement and the dependency tracking and query level SQLRewtStatement are the two main backup are carried out compulsorily stubs of the rewriting engine. They when any function is invoked and thus each contain functions to carry out the Trans_Dep and Transactions tables Dependency Tracking and Query Level can be integrated into a single table Database Backup but Statement Transactions (Trans_ID, TS, Dep_ID). provides these functionalities Apart from the number of tables independent of each other where as created, the Statement and the
  5. 5. SQLRewtStatement modules also differ is reading a set of rows that were in the manner in which the statements previously updated by some other are rewritten. transactions. Thus, with an exception of the Delete command all the other 3.2 Dependency Tracking statements can be rewritten with respect to dependency tracking. The module for dependency Finally, the rewritten transactions tracking carries out transformations on are ordered according to the sequence the incoming queries and obtains and in which they need to executed on the maintains the dependency information backend. Based on the feature selected among the transactions. Trans_Dep by the user, the set of queries are either and the DDL_Dep are the tables that executed on the database or the are created in order to support tracking. rewritten set is directly returned to the All the dependency information is user. collected over the execution sequence of the transaction and is subsequently 3.4 Query Level Database stored in the corresponding table once Backup the transaction is ended or committed. The tracking carried out is Query Level Backup entails specifically row based dependency storing only the pre-images of the rows tracking where dependencies are being affected by any query where this recorded based on the set of rows selective backup leads to a smaller being modified. Dependencies exist overhead compared to storing the among those transactions that update entire database image. Fundamentally, or read a set of rows that were whenever a DML statement is previously created or modified by a encountered, all the rows that are to be different set of transactions. affected by these statements are backed Transformations on the queries up onto a separate table. At any point are carried out based on the type of of time, when a rollback is required to SQL statement. In case of DDL a previous consistent state, the stored statements such as Create Table, an values can be restored onto the affected additional field called TRANS_ID is rows. added to the schema of the table being Query level backup also carries out created. This field is used to record the a transformation or rewriting on the Id of the transaction that had Inserted/ queries being issued to the database. Modified the specific row. For a Create command, a Additionally, the Id of the transaction corresponding backup table is created running this command, the current as <table_name>_BAK. The backup timestamp, the table name and the type table contains additional columns of command (Create/Alter) is stored in along with the attributes of the original the DDL_Dep table. table. The additional fields contain the In the case of Insert command, the Ids of the transactions modifying the query is rewritten to include the Id of set of rows, the timestamp of the the transaction that is issuing the Insert operation, the row identifiers of the command. For an Update Statement, rows being backed up and a flag the Ids of the transactions that had indicating whether the row was deleted modified the read set of the query is or updated. obtained and stored in the Dep_ID An Alter table command is field of the Trans_Dep table. The rewritten only if the command would query is then rewritten to insert the Id alter the number of columns of the of the transaction currently issuing the Update statement into the TRANS_ID Original Query Rewritten Query field of the table. For a Select Create Table XYZ(A1 Create Table statement, the read set Ids are stored in Integer, A2 Varchar(10), XYZ_BAK(A1 Integer, A2 the Trans_Dep table as the transaction A3 Real) Varchar(10), A3 Real,
  6. 6. Trans_id Integer, TS table. Timestamp, IS_DEL Row Ids from the backup table Integer, Rid Rowid) that do not match the original table Alter Table XYZ drop Alter Table XYZ_BAK with flags indicating that the row along column A1 drop column A1 was deleted helps us in identifying Update XYZ set A3 = 1.0 Insert into XYZ_BAK deleted rows. These rows are then where C1 (Select * from XYZinserted into the original table. where C1) _______________________________ Delete from XYZ where Insert into XYZ_BAK ______________ C2 (Select * from XYZ * The rewritten query is not in the exact format where according to our implementation. The Figure 2 is C2) only for illustration purposes. The user is required to specify a Figure 2:* Simple example of transformations carried outtimestamp for the restore application on the query with respect to query backup. and all the rows that were modified after the specified point in time are table, like deletion or insertion of restored onto the original table in a additional fields to the table. reverse temporal order. For an Update/Delete statement, This feature provides the user with all the rows to be affected by the sufficient flexibility to restore the rows command are stored as a backup before of only a particular table, restore rows the actual query is executed. modified by a particular transaction, Query level backup can be carried restore all the rows that were updated out for DML statements such as in the original table or restore all the Update and Delete and DDL rows that were deleted from the statements such as Create Table and original table. Alter Table but the Select and Insert statements cannot be rewritten with respect to query level backup as they 4 Extensions to do not modify any existing rows. Dependency Tracking 3.5 Database Restore 4.1 Motivation This functionality helps to Row based dependency tracking reconstruct the image of the database is not a panacea in determining the to a prior state by carrying out a query exact inter relationship shared among level restore. The restoration is carried the various transactions. Dependency out by utilizing the pre-images of the is tracked by recording the Ids of rows stored in the backup tables. Based transactions that had previously on the timestamp, all the rows affected modified the rows that are being read after the specified time are retrieved by a current transaction. This method from the backup table. The retrieved of tracking can lead to the rows are then restored onto the original identification of false positives. table in the reverse temporal order, Consider a scenario where we with the younger rows being restored have two transactions T1 and T2 that before the older ones. are both modifying a table XYZ. The restoration process involves recreating the image of the table prior Row Id A1 A2 A3 to the effects of either update or delete 1 10 20 100 statements. During restore, row Ids of 2 30 40 200 the retrieved rows from the backup table matching the ones in the original T1: Update XYZ set A1 = 20 where table indicate that the row was A3<150 updated. Subsequently, rows from the T2: Update XYZ set A2 = 50 where original table are replaced by their A3<150 corresponding values in the backup
  7. 7. stored in every <attribute_name>_DEP In the above scenario, the Id of T1 field. will be recorded as the modifying In case of an Update statement, all transaction for Row 1 because it the values from the updates an attribute of that row. When <attribute_name>_DEP fields that the query is issued by T2, the Id of T1 correspond to the fields in the ‘where is retrieved as transaction upon which clause’ of the statement are retrieved. it has a dependency. This is because T1 Next, the Id of the transaction had previously modified the row being executing the command is inserted into accessed by T2. Upon careful scrutiny, the corresponding ‘DEP’ field based we realize that the two transactions are on the attribute being modified. The updating different attributes of the retrieved set of Ids identifies the same row and do not share any inter transactions that the current transaction dependency among themselves. Such a has a dependency upon. situation is termed as false positives. A Select statement is rewritten in Row based dependency tracking a manner similar to an Update can lead to similar conditions where it statement but the updating of the incorrectly identifies relationships that ‘DEP’ fields with the Id of the do not really exist. transaction issuing the command is not carried out. 4.2 Column Based Dependency Tracking Original Rewritten Query Query In order to prevent the detection Create Table Create Table XYZ (A1 Integer, A2 of false dependencies, a finer level of XYZ (A1 Varchar(10), A3 Real, A1_DEP tracking needs to be carried out. Integer, A2 Integer, A2_DEP Integer, A3_DEP Column Based Dependency tracking Varchar(10), Integer) plays a pivotal role in eliminating false A3 Real) positives and identifies the exact set of Insert into Insert into dependencies among the transactions. XYZ(A1,A2,A XYZ(A1,A2,A3,A1_DEP,A2_DEP,A3_ Tracking is carried out at a finer 3) values DEP) values (1, granularity where transactions are (1, ABC,2.0) ABC,2.0,Cur_TID,Cur_TID,Cur_TID) tracked based on the attributes that Update XYZ Select A2_DEP from XYZ where A2 = they modify. This eliminates all set A1 = 10 MNO possibilities of incorrectly identifying where Update XYZ set A1 = 10, A1_DEP = false positives as only transactions that A2 = MNO Cur_TID where A2 = MNO had previously modified the attribute Select A3 Select A1_DEP from XYZ where being accessed are recorded as from XYZ A1=10 dependencies. where A1 = Every table created contains 10 additional fields that store the dependency information on a per Figure 3: An example of rewriting carried out with respect to attribute basis. This is known as complete column based tracking. Complete Column Based Dependency Tracking [2] where auxiliary information is stored for every attribute Thus the above method helps in in the table. For a Create Table isolating disjoint set of transactions statement, a field that modified the same row but <attribute_name>_DEP corresponding different columns. to every attribute of the table is also created. This new field records the Id 4.3 Identifying Phantom of the transaction that recently updated Dependencies the corresponding attribute of the table. For an Insert statement, the Id of the Column based tracking can also transaction issuing the command is lead to a situation of false negatives [2]
  8. 8. where a transaction could have led to a The result sets differ if the rows different outcome if some previous returned from the Pre Image table is transaction had not modified a certain absent from the result set of the set of rows. Though these transactions Current Image table. are independent of each other where no Let us assume that all the dependency exists between them, a transactions belong to a universal set subtle relationship does exist in certain T. Let the set of transactions that need scenarios. to be undone be represented by a set Consider the two transactions T1 Sc. Let Sn = T – Sc, represent the set and T2 updating a table XYZ. of transactions that are not in the Undo set and need to be processed before we Row Id A1 A2 consider undoing them. The following 1 10 20 algorithm is used to determine whether 2 30 40 any phantom dependency exists between the elements in Sc and Sn. T1: Update XYZ set A3 = 500 where • We process the transactions A2 = 40 in an ascending order of T2: Update XYZ set A1 = 1 where A3 timestamps. For every < 400 transaction T’ in Sc, obtain all the columns modified by In this scenario, T1 updates only it. Add these set of columns row 2. The read set of T2 corresponds to a set C if they weren’t only to row 1 and it updates the already added. For any new attribute A1 accordingly. If we identify column added to set C, add T1 as an erroneous transaction then T2 the column to the Pre Image will not be present in the Undo set as it table and restore the values of does not have a direct dependency on that column to that point in T1. But we realize that in the absence time before any modification of T1, the outcome of T2 would have was carried out by T’. been different where it would have • For every transaction T’’ in updated both the rows. Thus T2 is said Sn, we run the ‘WHERE’ to have a phantom dependency on T1. clause of the transaction In the face of database corruption against the Pre Image table due to any malicious transaction, the and obtain a set of rows X. exact set of transactions to be rolled Next, we use the Backup back needs to be determined. This table (created by the Query process involves the detection of Level Backup function) and phantom dependencies too. For this we obtain rows corresponding to maintain two images of the table to be the Current Image Table. The rolled back – a Pre Image table and a following is carried out to Current Image table. The Pre Image obtain the Current Image table <original_table_name>_PRE rows. consists of all the rows that were The Backup table present in the table before any consists of the pre – modification was made by the image of all rows that were erroneous transaction. The Current modified by the transaction Image table contains the effects of the T’’ that is, they satisfy the execution of the erroneous transaction. ‘WHERE’ clause of To determine phantoms, the ‘WHERE’ transaction T’’. These images clause of every transaction succeeding correspond to the effects of all the erroneous transaction is executed previous transactions that had against the Pre Image and Current already modified the table Image tables. If we obtain differing prior to T’’. They contain the result sets from the two tables then we exact sequence of events that correctly conclude that phantoms exist. had occurred due the actions
  9. 9. of the malicious transaction Row Id A1 A2 A3 and hence map to the ‘Current 1 10 20 100 Image’ of the table. 2 30 40 200 All the rows ‘Y’ corresponding to T’’ are T1: Update XYZ set A3 = 500 where retrieved from the Backup A2 = 40 table and these form the T2: Update XYZ set A1 = 1 where A3 Current Image rows. < 400 • A comparison is carried out and if X is not equal to Y, After the execution of the above then a phantom is detected. queries, the table XYZ will have the The transaction T’’ is added following image. to the set Sc and it processed as a regular erroneous Row Id A1 A2 A3 transaction. 1 1 20 100 2 30 40 500 In order to support the efficient detection of phantoms, we made the The Backup table XYZ_BAK following assumptions – every corresponding to table XYZ will appear transaction modifies only a single table as follows and every transaction issues only a single SQL command. With these Row Id A1 A2 A3 Trans_I assumptions in mind, we made the D following design changes 2 30 40 200 1 1 10 20 100 2 • The TRANSACTIONS table that stored meta data Let us set Sc = T1 and this pertaining to the transactions automatically implies that T2 is in Sn. affecting the database was Initially, the Pre Image table modified with an updated XYZ_PRE will contain only the pre schema. image values of the attribute A3 as it • TRANSACTIONS(TRANS_ID, was the column updated by T1 in Sn. TS, DEP_ID,TABLE_NAME, WHERE_CLAUSE, Row Id A1 A2 A3 COL_LIST) where 1 1 20 100 TRANS_ID stores the Id of the 2 30 40 200 transaction issuing the query, TS corresponds to the Next, the ‘WHERE’ clause of T2 timestamp of the operation, is run against the Pre Image table DEP_ID contains the set of XYZ_PRE and rows corresponding to dependencies for that row Ids 1 and 2 are obtained. This transaction, TABLE_NAME implies X = {1, 2}. corresponds to the table being Then, the row corresponding to accessed, WHERE_CLAUSE the Current Image of XYZ that is contains the ‘WHERE’ clause backup image of rows modified by T2 of the query and COL_LIST in the Backup table XYZ_BAK is contains all the columns that obtained and Y is set to {1}. were modified by the Next, X and Y are compared and transaction. we find that an additional row has been retrieved from the Pre Image table Let us consider a simple example XYZ_PRE. From this we conclude that of two transactions T1 and T2. Let us T2 has a phantom dependency on T1. assume that both the transactions T2 is then added to the set Sc and is access the table XYZ. processed as a regular erroneous transaction.
  10. 10. a total of 19000 read/write transactions 5 Performance Evaluation and 5000 read intensive transactions. 5.3 Dependency Tracking with 5.1 Experimental Setup Query level The experimental setup consisted of two machines set up on a 100 Mbps Database Backup local LAN. The machines were organized as in figure 1 and consisted The tests were carried out by of a server and a client machine. The invoking the Dependency Tracking and server housed the oracle 10g database Query Level Backup functionalities and was set up on a Linux 2.6 FC4 box from the Engine. Four tests were with a processor speed of 2.8 GHz, carried out – Column Based disk space of 92 GB and supported 1 Dependency Tracking with Query GB of memory. The client was on a Level Backup under read/write load, Microsoft Windows XP laptop with 80 Row Based Dependency Tracking with GB of hard disk capacity, a processor Query Level Backup under read/write speed of 1.5GHz and 512 MB of load, Column Based Dependency RAM. The .NET 2.0 framework was Tracking with Query Level Backup set up on the client machine to support under read intensive load and Row the SQL Rewriting Engine. Based Dependency Tracking with Query Level Backup under read intensive load. The results are show 5.2 TPCC Benchmark Performance testing was carried 20 out on the Engine by measuring the overhead incurred by employing the 15 Row functions of the engine. Tests were Based carried out according to the TPCC 10 Benchmarks [6] where the benchmarking simulates real time 5 transactions of a business activity for Colum n processing customer orders. The Based 0 benchmark specifies a mix of W=2 W=4 W=6 W=8 W=10 read/write and read intensive workloads. The foot print size of the database was varied by altering the Figure 4: Row/Column Based Dependency Tracking along ‘warehouse’ size. The measurements with Query Level Database Backup under Read/Write for the read/write transactions and the intensive load. The figure indicates the ratio of increase in read intensive transactions were taken processing time after employing the above mentioned separately and the overhead was functionalities. recorded for varying foot print sizes. The read/write transactions consisted of an interspersed mix of in figures 4 and 5 respectively. The New Order, Payment and Delivery measurements displayed indicate the operations. The read intensive ratio of increase in processing time transactions consisted of a series of with the invocation of functions from read-only Stock Level queries. The the Engine. initial tests were carried out with W=2 The results indicate that there that had about 700 read/write and 3000 exists a very large overhead (% in read intensive transactions. The order of subsequent tests were carried out by 1000s) 2500 varying the warehouse factor in for both increments of 2 until W=10 which had column 2000 and row 1500 % of additional 1000 dependencies 500 0 20 40 60 80 100
  11. 11. based dependency tracking along with query level backup. The reason for this stems from the fact that both Dependency Tracking and Query Level Backup functionalities are integrated 110 together and a higher load is placed on 108 the system in terms of query interception, parsing and rewriting. 106 Phantom Detection Due to the additional burden of the 104 Overhead above mentioned operations, the overhead turns out to be quite 102 overwhelming. 20 40 60 80 100 5.5 Phantom Dependency Figure 6: Phantom Detection Overhead with varying Undo Tracking Set sizes on the X axis and corresponding overhead on the Y axis. The phantom dependency detection module was executed after the initial run of all the transactions. Figure 7: Percentage of additional dependencies discovered An undo set was provided as an input, due to the detection of phantoms given an initial Undo Set. based on which phantoms were The Undo Set sizes are varied along the X axis and the % of additional dependencies detected is recorded on the Y axis. detected and the corresponding direct dependencies along with phantoms were added to a final undo set. The overhead of invoking the Phantom Dependency Tracking (figure 6) was noted and it was found to be hovering around a constant range of 100%-110% even when provided with larger undo sets. 6 Future Work Results also indicate that the initial discovery of new dependencies (figure 7) with an undo set of size 20 is 6.1 Multi Tenant Data quite high but with larger undo sets, Architecture the percentage significantly decreases. This is because, with the larger undo Multi Tenant Data Architecture sets, the same set of phantoms is liked [8] is a SaaS (Software as a Service) to be detected and this would not lead application that aims to offer the user to a significant increase in the number differing levels of flexibility for of new dependencies discovered. centralized control and storage of user data. SaaS deals with the centralized management of user data where accessing of data is more efficient 80 when compared to locally installed application. The user needs to Row 60 surrender some level of control of their Based data to the SaaS vendor and the vendor 40 then needs to provide efficient measures to ensure that the user data 20 will not be compromised at any point Colum n 0 in time. Based W=2 W=4 W=6 W=8 W=10 The user can choose from varying levels of data isolation that is, they can decide the manner in which their data is to be stored. Multi Tenant Data Figure 5: Row/Column Based Dependency Tracking along with Query Level Database Backup under Read intensive load. The figure indicates the ratio of increase in overhead after employing the above mentioned functionalities.
  12. 12. Architecture (MTDS) offers three additional pointer that points to a table levels of isolation – Separate containing the metadata pertaining to Databases, Shared Database Separate that specific column. This does not Schema and Shared Database Shared restrict the tenant to a preset number of Schema. Separate Databases involves columns and extensibility is made as storing the tenant data in separate flexible as possible. Thus with these databases and is more suitable for varied features tenants are given a larger organizations that store a large choice of different levels of amount of data and who maintain customization. highly sensitive data. Shared Database In the face of corruption of a Separate Schema architecture allows particular tenant’s data, the restore for tenants to share the same database process involves the complete but maintains separate tables for restoration of the database onto a different tenants. Shared Database separate temporary database. Next, Shared Schema has tenants sharing the only the tables corresponding to the same database and same tables where a affected tenant are exported from the level of security is built into the system temporary database onto the original that prevents unauthorized access of database. This expensive procedure is other tenants’ data. The latter two carried out to prevent modification of schemes are more suitable for tenants the unaffected tenants’ data in the that prefer a lower cost of maintaining database. their data and are ready to take the risk of co-locating their data with other 6.2 Extensions to SQL customers. Rewriting Engine The two Shared approaches discussed above have a high initial setup cost because customized services The SQL Rewriting engine can be need to be built to ensure that data is further enhanced to support Multi being accessed only by authorized Tenant Data Architecture. MTDA users. At the same time, in case of employs costly methods to restore corruption of a particular tenant’s data, tenants’ data in the case of data a costly restoration procedure is carried corruption. We can adapt the Query out in order to prevent affecting the Level Backup functionality and the other tenants’ data. Restore functionality of the Rewriting The architecture suggests the use Engine, to carry out restore in MTDA. of Tenant View Filter for ensuring We can store the pre images of rows isolation among tenants in a shared corresponding to different tenants in approach. Here, based on the SID of the backup table prior to their the tenant, rows belonging to that modification. This data can also be tenant are retrieved. This offers a replicated on different servers to filtered view to any tenant accessing ensure durability. During the restore the database. process, only the rows corresponding The architecture also discusses to a particular tenant can be updated customization of the schema for onto the original table thus preserving various tenants accessing the Shared the data of other tenants. This selective table. Pre allocated fields and Name method of restoration will work out to Value Pairs were some of the be cheaper than the current method approaches suggested. Pre allocated employed by MTDA and can service fields consists of preset custom each tenant independent of the other. columns that any tenant can extend Features for extensibility patterns based on their requirement. Name of the MTDA can also be incorporated Value Pairs consist of two levels of into the Rewriting Engine. Based on indirection where a pointer from the the tenant issuing the query, the original table points to a table command can be transformed containing the custom field value. The depending on the custom columns custom field value table contains an created by the tenant. The rewriting
  13. 13. can be carried out for both Pre 8 References allocated fields and Name Value Pairs depending on the functionality chosen for the architecture. After analyzing the [1] Alexey Smirnov, Tzi-cker source of the query, the command can Chiueh, A Portable Implementation be split or rewritten to access different Framework for Intrusion-Resilient tables storing the custom column Database Management Systems, in information. Proceedings of DSN 2004, Florence, In this manner the tenant will not have Italy, 2004. to keep track of the complexity involved on the backend but rather the [2] Shweta Bajpai, Alexey Smirnov rewriting will be done transparently in and Tzi-cker Chiueh - Accurate Inter- order to service each customer. Thus Transaction Dependency Tracking for the Rewriting Engine can be enhanced Repairable DBMS. to incorporate additional features that can support the Multi Tenant Data [3] General SQL Parser - Architecture. http://www.sqlparser.com/ [4] Microsoft .NET 2.0 Framework - http://msdn2.microsoft.com/enus/netfra mework/default.aspx [5] Oracle 10g Database Server - http:// 7 Conclusion www.oracle.com/technology/software/ products/database/index.html The SQL rewriting engine integrates the common functionalities [6] TPCC Benchmarking - of the three different applications – http://www.tpc.org/tpcc/ Dependency tracking, Query Level database backup and the Query Level [7] C# Reference library Restore function and provides a -http://msdn2.microsoft.com/en- common framework to manipulate the us/library/default.aspx various features of these applications. The user level API offers sufficient [8] Multi Tenant Data Architecture - flexibility to the users to understand http://msdn2.microsoft.com/enus/librar and employ the features provided. y/aa479086.aspx Phantom detection allows for accurate detection of the exact set of transactions to be undone and thus proves to be a very helpful tool before any restore process. Results indicate that Phantom Dependency Tracking incurs a relatively low overhead for determining the final undo set. Also, a large number of new interdependencies are detected by this function and provides a more accurate picture of the transactions to be undone. The Engine can be further refined to provide additional functionalities to support Multi Tenant Data Architecture. Thus with the APIs provided, complex applications can be developed to suit the requirements of any data driven architecture.