• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Transaction Timestamping in Temporal Databases
 

Transaction Timestamping in Temporal Databases

on

  • 357 views

 

Statistics

Views

Total Views
357
Views on SlideShare
354
Embed Views
3

Actions

Likes
0
Downloads
9
Comments
0

2 Embeds 3

http://www.linkedin.com 2
https://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Transaction Timestamping in Temporal Databases Transaction Timestamping in Temporal Databases Presentation Transcript

    • German Shegalov Transaction Timestamping in Temporal Databases FR Informatik Graduiertenkolleg Ringvorlesung, May 26 th , 2003 based on the research by D. Lomet , C. Jensen and R. Snodgrass
    • Outline
      • Introduction: conventional vs. temporal
        • Temporal databases: valid-time , transaction-time
        • Databases and transactions (AC I D principles)
        • ( Optimistic ) concurrency control: TO, …
        • ( Pessimistic ) concurrency control: 2PL, …
      • Timestamping in transaction-time databases
        • Timestamping and strong 2PL (SS2PL)
        • Timestamping in distributed setting (2PC)
        • Timestamping since SQL-92 ( CURRENT_TIME )
    • Conventional vs. Temporal DB
      • A conventional DB captures only the most current state of modeled world
        • e.g. current account balance, employee's salary
      • A temporal DB supports a time domain and is thus able to manage time varying data
        • real-time stock quotes
        • employee's salaries between 1997 and 2000
    • Notions of Time
      • Transaction-time
        • is defined as the time when a fact is stored in the database that allows for as-of queries
      • Valid-time
        • is defined when a fact becomes effective (valid) in reality
      • Bitemporal databases support both of above
    • Transaction (ACID contract)
      • Atomicity (all or nothing in case of a failure)
        • begin;
        • acc 1 -= money; acc 2 += money;
        • commit;
      • Consistency
        • rollback updates upon a failed consistency check
      • Isolation
        • mask inconsistent intermediate state resulting from concurrent execution
      • Durability
        • commit ted updates must be failure-resilient
    • Transaction Isolation x=0 r 1 (x=0) r 2 (x=0) w 2 (x=x+20) w 1 (x=x+10) x=30 x=10 Lost Update: w 1 (x=10) r 2 (x=10) abort 1 =w 1 -1 (x) w 2 (x=x+10) x=0 x=10 x=20 Dirty Read: x=0 y=0 x=0 y=10 Inconsistent Read: r 1 (x=0) w 2 (x=5) w 2 (y=10) r 1 (y=10) Read/Write, Write/Read, Write/Write are not commutable
    • CC Protocols
      • Basic Timestamp Ordering (BTO)
        • each transaction i obtains a t i timestamp right away
        • operations are executed in the scheduled order
        • r i (x) : if t i ≥ w-time(x) then schedule else abort i
        • w i (x) : if t i ≥ max{w-time(x), r-time(x)} then schedule else abort i
      • Two Phase Locking (2PL)
        • prior to execution of an operation an appropriate lock is requested
        • no further lock requests after some lock has been released
        • lock is granted when no conflicting locks already present
        • otherwise add an edge to the Wait-For-Graph (WFG)
        • outperforms BTO
      • Strong 2PL (SS2PL)
        • locks are held until commit (IBM DB2, MS SQL Server , …)
    • Outline
      • Introduction: conventional vs. temporal
        • Temporal databases: valid-time , transaction-time
        • Databases and transactions (AC I D principles)
        • ( Optimistic ) concurrency control: TO, …
        • ( Pessimistic ) concurrency control: 2PL, …
      • Timestamping in transaction-time databases
        • Timestamping and strong 2PL (SS2PL)
        • Timestamping in distributed setting (2PC)
        • Timestamping since SQL-92 ( CURRENT_TIME )
    • TT Database Semantics
      • Each record has a timestamp
      • Insert creates a new record
      • Update inserts a new record version
      • Delete inserts an empty record version ( delete-stub ) for the record being deleted
      • Timeslice Q(t) executes Q against DB as of t
        • returns for each qualifying record the latest version with timestamp ≤ t unless it is a delete-stub
        • implies that timestamp order must agree with serialization order
    • Timestamp Selection (simple)
      • BTO provides proper timestamp order automatically
        • but it causes too many transaction restarts
      • SS2PL for any p i (x) < q j (x) in conflict:
        • pl i (x) < p i (x) < pul i (x) < c i < ql j (x) < qul j (x) < c j
        • commit order agrees with serialization order
        • chose commit time as timestamp
        • timestamping is not used for CC, thus no additional concurrency limitation
    • Two Phase Commit (2PC) Coordinator DB 1 DB 2 force-log begin Timeline force-log prepared force-log prepared force-log commit force-log commit force-log commit force-log end prepare prepare yes yes commit commit ack ack
    • Timestamping Issues in 2PC
      • Problem
        • network latencies and loosely synced clocks
        • commit points are different at all sites
        • max_commit_time < begin_time as perceived by the user
      • Observation
        • when X is prepared , all conflicting concurrent transactions will commit after X
      • Solution:
        • each database i votes EARLIEST i acceptable timestamp that is updated after logging prepared
        • commit with max{ EARLIEST i , begin_time}
    • 2PC for Transaction Time DB Coordinator DB 1 DB 2 force-log begin(10) Timeline force-log prepared;EARLIEST 1 ++ force-log prepared;EARLIEST 2 ++ force-log commit(11) force-log commit(11) force-log commit(11) force-log end /*begin_time = 10*/ /*EARLIEST 1 = 8*/ /*EARLIEST 2 = 10*/ prepare prepare yes(9) yes(11) commit(11) commit(11) ack ack
    • Timestamping since SQL-92
      • SQL query can ask for current time with some precision: year, month, date, …, millisecond
      • SQL-92 explicitly requires current time value to be fixed just within a single SQL statement
      • In TTDB a transaction logically takes place at a single point in time
        • current time value must not change until commit
    • &quot; Current Time &quot; Matters
      • X 1 reads non-current y as of t current
      • X 3 updates unlocked current y (e.g. a stock goes up enormously)
      • some time later: was X 1 aware of X 3 ?!
        • based on transaction timestamps: ct 1 > ct 3 => YES!
        • in fact: NOT GUILTY!!!!!!!!!!!!!!!!!!!!!
      • current time determines user-perceived transaction time
      r(y 0 ) X 1 X 3 time fix t current ct 1 ct 3 w 3 ( y 3 ) X 2 ct 2 w 2 ( y 2 ) buy
    • Inconsistent Timeslice
      • SS2PL accepts the schedule above
      • X 1 reads y from X 2 (hence, c 2 < c 1 ) => serialization X 2 < X 1
      • timeslice(2) = { (x, 1), (y,0), (z,2) } , when taken after 8 is transaction inconsistent, it has never been current
      • Reason: t X 2 > t X 1 although X 2 < X 1
      time X 1 X 2 x=0 y=0 z=0 5 c 2 fix t 1 current 1 fix t 2 current 3 w 1 (x=1) 2 w 2 ( y=1 ) 4 r 1 (y=1) 6 w 1 (z=2) 7 8 c 1
    • Unrepeatable Timeslice
      • writers after timeslice have to commit with a later timestamp than that of the concurrent timeslicing transaction
      X 1 X 2 y=0 y=0 y=1 time X 3 6 c 2 4 c 1 fix t 1 current 2 timeslice 1 (t 1 current ) 3 w 2 ( y=1 ) 5 timeslice 3 ( t 1 current ) 7 8 c 3 fix t 2 current 1
    • Solution Requirements
      • SS2PL remains the primary CC mechanism
        • reduce the likelihood of transaction aborts
      • If X has t current = t then X has started and not yet committed at time t
      • X 1 and X 2 with t 1 current < t 2 current then
        • X 1 must not see X 2 's updates
        • there exists an equivalent serial schedule: X 1 < X 2
    • Algorithm Design
      • each data item d has write-timestamp d. TT
      • read timestamp d. T R in volatile memory
      • reads define the lower transaction time bound t l
        • initially t l := t s (transaction start time)
      • V R (initially  ) volatile transaction's read-set
      • V I (initially  ) volatile transaction's write-set (newly inserted versions)
      • t X timestamp of transaction X
    • Before t X Assignment
      • Read(d): /*sync t X with conflict write*/
        • t l := max { t l , d .TT } /* prevent t X ≤ d .TT */
        • V R := V R  { d } /* will have to update d .T R */
      • Write(d): /*sync t X with conflict write&read*/
        • t l := max { t l , d. T R , d .TT } /*prevent t X ≤ d .TT and t X ≤ d .T R */
        • V I := V I  {d} /* will have to update d .TT */
    • Timestamp t X Assignment
      • if because of CURRENT_TIME request
        • t X := t current /* safe because t current > t l */
      • if immediately before COMMIT
        • t X := t l ++ /* smallest possible time greater than t l */
    • &quot;Who comes too late … &quot;
      • will be punished by scheduler
      • Read(d) :
        • if t X < d .TT then abort X
        • else V R := V R  { d } /* as before */
      • Write(d):
        • if t X < max { d. T R , d .TT } then abort X
        • else V I := V I  { d } /* as before */
    • Optimization I (Precision)
      • user-specified current time precision allows for a broader range of acceptable timestamps
        • e.g. current year &quot;now&quot; and on Dec 31 th 2003, 23:59:59,999 is still the same
      • t X := t current t X := ( t l , t h = max ( t current ,p )]
      • allow data access as before and thus potentially increasing t l
      • if t l ≥ t h then abort X
      • if X could be completely executed t X := t l ++
    • Optimization II ( RTT )
      • no way to maintain d .T R in main memory
      • fixed-size hash table RTT : e.g. 1024 entries
      • D i := { d | hash(d) = i } for i in 1 … 1024
      • trade-off: RTT size vs. read timestamp accuracy
      • Write(d), RTT is checked immediately
        • t l := max { t l ,RTT [ hash(d) ], d .TT }
      • Redefine V R to be 1024-bit-bitvector with
        • V R [i] = 1, if d has been read and i = hash ( d )
        • 128 byte overhead to track accessed data items
    • Commit Processing
      • /* update volatile RTT*/ for i:=1 to 1024 do if V R [ i ] = 1 then RTT [ i ] := max{ RTT [ i ] , t X }
      • /* timestamp data, part of transaction*/ /*either directly or by X -id- t X mapping */ for each d in V I do d .TT := t X
    • System Crashes
      • Observation
        • timestamping for commit ted X is safe
        • RTT passed away and
          • so did crash-interrupted X which needed RTT
          • committed transactions do not need RTT
        • last commit time is before crash time
        • each new X will start and commit after crash time
      • Recovery Action
        • RTT[i]:= last commit time
        • conservative read-write sync detection w/o penalty
    • Summary
      • transaction-consistent view on historical data
        • timestamp order consistent with transaction serialization order
      • Simple timestamp selection at commit time
      • Solution for distributed transactions with 2PC
      • Solution for &quot;CURRENT_TIME&quot; requests
    • Outlook
      • Impact on multiversion concurrency control
        • Read-Only Multiversion, Snapshot Isolation [Weikum + Vossen 01]
    • Questions