Paper – Data Access Pattern
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Paper – Data Access Pattern

on

  • 587 views

 

Statistics

Views

Total Views
587
Views on SlideShare
587
Embed Views
0

Actions

Likes
0
Downloads
5
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft Word

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Paper – Data Access Pattern Document Transcript

  • 1. Data Access Pattern Classification Scheme for avoiding Lost Updates in Transaction Processing Fritz Laux1, Martti Laiho2 1 Dpt. of Informatics, Reutlingen University, Alteburgstraße 150, 72762 Reutlingen, Germany, Tel: +49 7121 271 4019, Fax: +49 7121 271 90 4019, E-mail: Friedrich.Laux @ Reutlingen-University.DE 2 Dpt. of Business Information Technology, Haaga-Helia University of Applied Sciences, Ratapihantie 13, 00520 Helsinki, Finland Tel: +358 9 2296 5228, E-mail: martti.laiho @ haaga-helia.fi Abstract Modern DBMS systems eliminate the classical lost update problem in terms of protecting updates of a SQL transaction up to end of the transaction’s scope. However, lost updates can happen in the scope of a user transaction if it does not map one-to-one to a SQL Transaction. In spite of the server-side concurrency control applied to single SQL transactions by the DBMS system, the system cannot stop a SQL transaction from writing over committed updates written by other transactions. If the data written is based on stale (outdated) copy of the data, then the SQL transaction will get guilty of the lost update problem. This could be the case for example when a user transaction will contain series of SQL transactions. The usual way to avoid blocking of resources and prevent lost updates during a user transaction is to divide its data access phases into series of short SQL transactions (free of any user intervention) and use so called “optimistic locking” at client-side verifying that the updating transactions do not write over the updates of concurrent transactions of others. In this paper we will identify possible data update patterns for proper application of row version verifying (RVV) discipline. This classification provides us with two update patterns that will guarantee the correct use of RVV for avoiding the lost update problem. Efficient ways to implement row version indicators based on server-side version stamping and used for version verification are discussed and compared to implementations in modern database systems. As examples we show implementations of these patterns using the mainstream database systems like Oracle, DB2, and SQL Server. Keywords: data access, concurrency control, lost update, row version verification. 1. Lost Update Problem in Transaction Scope A typical fault in multi-user file-based systems without proper concurrency control is the Lost Update Problem i.e. a record x updated by some process A will be overwritten by some other concurrent process B like in the following simplified schedule 6 rA(x), rB(x), wA(x), wB(x)
  • 2. Properly administrated databases provide reliable storage services for data of information systems without losing any data. However, it is responsibility of application developers to use properly these reliable services, such as transactions and concurrency control. We assume that the reader is familiar with the concepts of SQL transactions, ISO SQL Standard isolation levels on tuning concurrency control of transactions and also on principles of database locks. Instead of concurrency control theories presented in database textbooks we are interested in the implementations in mainstream DBMS systems of today and what application developers need to understand about reliable access of databases. Table 1 summarises the SQL isolation levels and concurrency control implementations in modern mainstream DBMS systems (6, 6, 6, 6). Locking Scheme Multi-Versioning Concurrency Control Concurrency Control Concurrency: (LSCC): (MVCC): SQL Server SQL 2005, allow ISO Server Oracle snapshot Scope: Isolation: SQL DB2 V9.5 2005 11g isolation Oracle 11g Cursor Read Only yes select .. (X-locking of rows) for update Optimistic with values yes Optimistic with (row change timestamp timestamp) yes "Scroll Cursor Stability CS Locks" Transaction Read Uncommitted yes RU yes Snapshot Read "Read Committed yes Committed" Read Committed yes CS yes Snapshot yes "Serializable" Repeatable Read yes RS yes Serializable yes RR yes Table 1: Summary of Isolation Levels and Concurrency Control Implementations Updates made in a transaction will be protected up to the end of the transaction against overwriting by other concurrent transactions using either locking scheme concurrency control (LSCC) or multi-versioning concurrency control (MVCC) which are the modern implementations of so called "optimistic concurrency control systems". A MVCC system never blocks readers, but has the price that the readers may get stale data. In scope of a single SQL transaction the schedule presented above is not possible in the DBMS systems, so we don’t have the Lost Update Problem on our own updates, - but we need to consider also series of SQL transactions in their application context i.e. in the scope of user transactions .
  • 3. 2. Lost Update Problem in Application Context Let us consider first the following problematic scenario of SQL transactions of two concurrent processes A and B updating balance of the same account in Fig. 1. A1 Figure 1: A Lost Update scenario by a SELECT - UPDATE transaction (A) A2 The withdraw of 200 € made by the transaction of B will be overwritten by A, in other words the update made by B in step 5 will be lost in step 7 when transaction of A will overwrite the updated value by value 900 € which is based on stale data i.e. outdated value of the balance from step 3. If the transactions of A and B would serialize properly, the correct balance value after these transactions would be 700€, but there is nothing that the DBMS could do to protect the update of step 5, since guilty to this lost update problem is the programmer of process A, who has ordered from the DBMS a wrong isolation level. READ COMMITTED, which for performance reasons this is the default transaction isolation level used by most RDBMS systems, does not protect any data read by transaction of getting outdated right after reading the value. Proper isolation level on LSCC systems should be REPEATABLE READ or SERIALIZABLE, which would protect the values read in the transaction of getting outdated during the transaction by holding shared locks on these rows up to end of the transaction. The isolation service of the DBMS does guarantee that the transaction will either get the ordered isolation or in case of serialization conflict the transaction will be rejected by the DBMS. Means of this service and transactional outcome for the very same application code can be different on use of different DBMS systems, and even on different table structures. Usually a transaction rejected due to serialization conflict should be retried by the application, but we will discuss about this later. The erroneous scenario above would be the same also if process A commits its transaction of steps 1 and 3 (let us call it transaction A1) in step 4 and continues (for example after some user interaction) with another transaction A2 of phases 7-8.
  • 4. In this case no isolation level can help, but transaction A2 will make a blind write (based on stale data, insensitive of the current value) over the balance value updated by transaction B. 3. A Taxonomy of Updates Avoiding Lost Updates The blind write of the update transaction A2 of phases 7-8 (resulting in the lost update of transaction B) could have been avoided by any of the following types of practices: Type 0: There is no risk for lost update if A2 in step 7 had been using the form of the update which is sensitive to the current value, like B uses in step 5 as follows UPDATE Accounts SET balance = balance - 100 WHERE acctId = :id; Type 1: After transaction A1 first has read the original row version data in step 3 transaction A2 verifies in step 7 with an additional comparison expression in the WHERE clause of the UPDATE command that the current row version in the database is still the same as it was when the process previously accessed the account row, for example UPDATE Accounts SET balance = :newBalance WHERE acctId = :id AND (rowVersion = :old_rowVersion); The comparison expression can be a single comparison predicate like in the example above where rowVersion is a column (or a pseudo-column provided by the DBMS) reflecting any changes made in the contents of the row and :old_rowVersion is a host variable containing the value of the column when the process previously read the contents of the row. In the case that more than one colum is involved the comparison expression can be built of version comparisons of all columns used and based on the 3- value logic of SQL. Since Type 1 update transaction does not explicitly read data there is no need to set isolation level. The result of the concurrency control services is the same for LSCC and MVCC based DBMS. The result of the update depends on the RVV predicate and the application code needs to read the indicator of updated rows from the DBMS to verify the result. Type 2: (re-SELECT .. UPDATE) is a variant of Type 1 in which transaction A2 first reads the current row version data from the database into some host variable current_rowVersion SELECT rowVersion INTO :current_rowVersion FROM Accounts WHERE acctId = :id ; and then applies the conditional update
  • 5. if (current_rowVersion = old_rowVersion) then UPDATE Accounts SET balance = :newBalance WHERE acctId = :id ; In this case it is necessary to make sure that no other transaction has changed the row between the SELECT and the UPDATE. For this purpose we need to apply a strong enough isolation level (REPEATABLE READ, SNAPSHOT, or SERIALIZABLE) or explicit row-level locking such as Oracle's FOR UPDATE clause in the SELECT command. Since isolation level implementations of LSCC and MVCC based DBMSs are different, the result of concurrency services can be different: In LSCC based systems the first writer of the row or reader using REPEATABLE READ or SERIALIZABLE isolation level will usually win, whereas in MVCC based systems the first writer wins the concurrency competition. 4. RVV Discipline and Server-Side Stamping Solutions Type 1 and Type 2 updates don't require any locking before transaction A2 and the update method is generally known as "Optimistic Locking" 6, but we prefer to call it Row Version Verification (RVV) Discipline. There are multiple options for row version verification, including comparison of original contents of all or some relevant subset of columns of the row, a checksum of these, a technical SQL column, or some technical pseudo-column maintained by the DBMS. A general solution for row version management is to include a technical row version column and using a row-level trigger to increase the value of column rv on any row automatically every time the row is updated. We call the use of trigger or use of technical pseudo-column as "server-side stamping" which no application can bypass, as opposite to client-side stamping using the SET clause within the UPDATE command, a discipline that all applications should follow in that case. Row-level triggers are affordable, but have performance cost of some percents in execution time on Oracle and DB2, whereas SQL Server does not even support row-level triggers. Timestamps are typically mentioned in database literature as a mean for differentiating update versions of a row, but our tests 6 prove that for example on a 32bit Windows workstation using a single processor Oracle 11g can generate up to 115 updates having the very same timestamp, and almost same problem applies to DATETIME of SQL Server 2005 and TIMESTAMP of DB2 LUW 9, with exception of the new ROW CHANGE TIMESTAMP option in DB2 9.5 which generates unique timestamp values for every update of the same row having technical TIMESTAMP column. The native TIMESTAMP data type of SQL Server is not a timestamp but a technical column which can be used to monitor the order of all row updates inside a database. We prefer to use its synonym name ROWVERSION. This provides the most effective server-side stamping method in SQL Server; although as side-effect it generates an extra U-lock which will result in deadlock in the example of Fig. 1. In version 10 and later versions Oracle provides a new pseudo-column ORA_ROWSCN for rows in every table created using ROWDEPENDENCIES
  • 6. option 6. This will show the transaction SCN number of the last committed transaction which has updated the row. This provides the most effective server- side stamping method for RVV in Oracle databases, although as harmful side-effect row-locking turns its value to NULL. In our "RVV Paper" 6 we have presented SQL view solutions for mapping these technical row version column contents into BIGINT data type for row version verification (RVV) at client-side. 5. Data Accesses of a Typical Use Case A typical multi-tier architecture today makes use of the Model-View-Controller (MVC) pattern, where the Model part (M) is responsible for accessing the database (Data Access). The role of View and Controller tiers are not in the scope of our paper. We will focus just on topics of the Data Access tier. Reliable data access requires SQL transactions and various transaction patterns can be used depending on the phases of the use cases. Fig. 2 presents a generic use case (user transaction), for example maintenance of product inventory as part of order entry system, or maintenance of customer data. Figure 2: Use case, MVC implementation and the Data Access Patterns The numbers in Fig. 2 identify the phases in the use case scenario where the user first picks the proper object from a search list (phases 1 and 3), the object data is
  • 7. then presented to the user on a form (phase 5), after updating the data on the form the user presses some “save” button and the changed data will be updated in the database. We will now focus on the implementation of the scenario as SQL transactions on the Model tier: Phase 2 is implemented as a READ ONLY transaction which reads some relevant attributes of objects using some selection criteria set by the user at phase 1, and returns the result set at phase 3 on the View tier for final selection of the proper object. In case of MVCC systems any default isolation level will do here, since MVCC systems don't block readers. In case of LSCC Read Uncommitted isolation level is enough, since we don't want to block concurrent transactions and it is enough to know that the listed objects exists. If the characteristics of the selected object has been changed in phase 5 the user can return to make a new selection. Phase 4 is implemented as a READ ONLY transaction using “singleton SELECT” fetching all relevant attributes of the selected object to be presented to the user at phase 5 on the View tier. Obviously this transaction is not allowed to read uncommitted data. For minimal blocking of concurrent transactions we don’t keep the object locked in the database, but phase 4 starts the “client-side scope of concurrency as seen by the application saving the original data (or what will be needed, for example just id and row version data) in a cache of the Model tier for the row version verification at phase 6. Phase 5 in our scenario stands for the user interface phases on View tier, which may take unpredictable time to complete. This may also require additional READ ONLY lookup database transactions in the Model tier, but to keep the picture simple we have skipped presenting them. Phase 6 is a typical case of an updating OLTP transaction. It will get the updating data from the phase 5 View tier. Since some other concurrent transactions may have updated the data of the object in the database during user’s “thinking time” (and potential coffee break), and not to lose these updates by “blind overwriting”, the current data of the object has to be compared to the original data in the cache. Either Type 1 or Type 2 update will do. In case of Type 1 updating using UPDATE statement with instant row version verifying (RVV) predicates the isolation level has no meaning and results of the services provided by LSCC and MVCC systems are the same. In this case we also need to check if the UPDATE statement really affected the row. For Type 2 (SELECT .. UPDATE) we need to set at least the isolation level REPEATABLE READ which in case of LSCC systems guarantees that the row version read by the SELECT command is protected by S-lock and unless we happen to become victim of accidental deadlock situation we will manage to do the UPDATE part. Here LSCC systems will provide the better service, since in case of MVCC systems we need to set the isolation level as SERIALIZABLE and we will lose the competition to any concurrent updates. If the row version verifying fails i.e. the object has been changed in the meantime, then the user shall be notified (on phase 8 which is missing in the figure 1) of outdated data and control shall return to phase 3 (and in some case perhaps to phase 1) so that user can refresh the object data from database for new update. Whenever the update transaction requires use of multiple SQL commands, which is the case in Type 2 update (SELECT .. UPDATE) or if the data of the object to be
  • 8. updated is actually stored in multiple tables, then it is possible that the transaction will fail due to concurrency conflict, for example deadlock. Usually concurrency conflicts can be solved by applying the Retryer Transaction Pattern 6, but as we need to avoid lost updates of concurrent transactions this may not be the proper solution. At the end of successful phase 6 we should refresh the object version data in the Model cache in case the user will continue processing the object data. A committed transaction cannot be rolled back, but database textbooks discuss possible compensating transactions 6, which by reverse update statements will restore the object data into the original state of phase 4 (for which we would need a copy of the original data). This is presented as step 7 in Figure 1, which is not guaranteed to be a success, since concurrent transactions may have already affected the situation, - and how about the “lost update problem” of those concurrent updates. Sometimes this may be possible apart from the database transactions and based on pure business rules: for example we may cancel the hotel / ticket reservation based on the reservation number of our own - a kind of business level locking. 6. Conclusion The concurrency control by DBMS treats SQL transactions without their application context, and this is the typical scope of database textbooks in teaching transaction programming. We see the need to expand this scope to application level, to typical user transactions which are the context of SQL transactions. Even if the widely accepted application Design Patterns of GoF 6 did not even mention database transactions, we can identify and build practical Data Access Patterns to be applied in teaching Data Access Technologies and in application development using modern DBMS systems for the benefit of database professionals and the industry. Modern application architectures have introduced new practices and needs which have outdated some practices of earlier SQL programming, such as holdable cursors in case we use connection pooling, etc. So for example we cannot apply optimistic locking services of cursors across transaction series inside our user transaction. Also new programming paradigms of Java world and .NET provide new possibilities and challenges. We will discuss on these in our presentation "Data Access using RVV Discipline and Persistence Middleware" at eRA-3. References [1] J. Gray, A. Reuter, "Transaction Processing: Concepts and Techniques", Morgan Kaufmann, 1993 [2] G. Weikum, G. Vossen, "Transactional Information Systems", Morgan Kaufmann Publishers, 2002 [3] E. Gamma et al "Design Patterns, Elements of Reusable Object-Oriented Software", Addison- Wesley, 1994 [4] C. Nock, "Data Access Patterns", Addison-Wesley, 2004 [5] M. Laiho, F. Laux, “On Row Version Verifying (RVV) Data Access Discipline for avoiding Lost Updates", http://www.DBTechNet.org/papers/RVV_Paper_080709.pdf [6] Oracle, "SQL Language Reference 11g Release 1 (11.1)", B28286-01, July 2007 [7] Microsoft, "SQL Server 2005 Books Online", http://msdn.microsoft.com/en-gb/library/ms130214.aspx [8] IBM, "DB2 Version 9.5 for Linux, UNIX, and Windows, Windows, SQL Reference, Volume 1", March 2008, SC23-5861-010