SlideShare a Scribd company logo
1 of 53
1Gray & Reuter: Resource Manager
Resource ManagersResource Managers
9:00
11:00
1:30
3:30
7:00
Overview
Faults
Tolerance
T Models
Party
TP mons
Lock Theory
Lock Techniq
Queues
Workflow
Log
ResMgr
CICS & Inet
Adv TM
Cyberbrick
Files &Buffers
COM+
Corba
Replication
Party
B-tree
Access Paths
Groupware
Benchmark
Mon Tue Wed Thur Fri
Jim GrayJim Gray
Microsoft, Gray @ Microsoft.comMicrosoft, Gray @ Microsoft.com
Andreas ReuterAndreas Reuter
International University, Andreas.Reuter@i-u.deInternational University, Andreas.Reuter@i-u.de
2Gray & Reuter: Resource Manager
Whirlwind Tour: The Actors
Resource managers
– provide ACID objects (transactional objects)provide ACID objects (transactional objects)
– Use log manager to record changesUse log manager to record changes
– Use transaction manager to coordinate multi-RM changesUse transaction manager to coordinate multi-RM changes
– Use communication manager to make transactional RPCsUse communication manager to make transactional RPCs
Transaction
Manager
Log
Manager
Log
Objects
Resource
Managers
Objects
Resource
Managers
Volatile Storage
Durable Storage
Volatile Storage
Durable Storage
Communication
Manager
Transaction
Manager
Log
Manager
Communication
Manager
Log
3Gray & Reuter: Resource Manager
Whirlwind Tour: the Application Verbs
TRID Begin_Work(context *); /* begin a transaction */
Boolean Commit_Work(context *); /* commit the transaction */
void Abort_Work(void); /* rollback to savepoint zero */
savepoint Save_Work(context *); /* establish a savepoint */
savepoint Rollback_Work(savepoint); /*return to savept (savept 0 = abort)*/
Boolean Prepare_Work(context *); /* put transaction in prepared state */
context Read_Context(void); /* return current savepoint context */
TRID Chain_Work(context *); /* end current and start next trans */
TRID My_Trid(void); /* return current transaction identifier*/
TRID Leave_Transaction(void); /*set process trid null, return current
id*/
Boolean Resume_Transaction(TRID); /* set process trid to desired trid */
enum tran_status { ACTIVE , PREPARED , ABORTING , COMMITTING , ABORTED , COMMITTED};
tran_status Status_Transaction(TRID); /* transaction identifier status */
4Gray & Reuter: Resource Manager
Whirlwind Tour
Types Of Transaction Executions
Shaded stuff is “undone”Shaded stuff is “undone”
Save Persistent
Begin
Action
Action
Save
Action
Save
Action
Action
Action
Save
Action
Action
Commit
Commit
A Simple
Commit
A Simple
Abort
Begin
Action
Action
Save
Action
Save
Action
Action
Action
Save
Action
Rollback
Action
Action
Action
Save
Action
A Partial
Rollback
Begin
Action
Action
Save
Action
Save
Action
Action
Action
Save
Action
Rollback
A Persistent Transaction
Surviving A System Restart
Begin
Action
Action
Action
Save
Action
Restart
Action
Save
Action
Commit
5Gray & Reuter: Resource Manager
Whirlwind Tour: the TRID Flow
Call graph: who calls whom.
TRIDs flow on all such calls.
Application is typically root.
RM can be an application (use a transactional RM to store state)
Application
Application
Servers
Resource
Managers
Resource
Managers
Transaction Application
Servers
6Gray & Reuter: Resource Manager
Whirlwind tour Normal (no failure) Transaction
Execution
TM generates the TRID at Begin_Work().
Coordinates Commit,
RM joins work, generates log records, allows commit
T r a n s a c t i o n
M a n a g e r
W r i t e C o m m i t
L o g R e c o r d &
F o r c e L o g
C o m m i t P h a s e 1 ?
Y e s / N o
C o m m i t P h a s e 2
a c k
T r a n s a c t i o n
C a l l b a c k s
F u n c t i o n s
W o r k R e q u e s t s R e s o u r c e
M a n a g e r
N o r m a l
F u n c i t o n s
L o c k R e q u e s t s
L o g R e c o r d s
W o r k R e q u e s t s
L o c k
M a n a g e r
t r a n s i d
L o g
M a n a g e r
A p p lic a t io n
B e g i n _ W o r k ( )
C o m m i t _ W o r k ( )
J o i n _ W o r k
7Gray & Reuter: Resource Manager
WW tour: The Resource Manger view
Resource
Manager
resource manager's own service interface
rmCall(...)
transaction
management
other
resource
managers
rmCall(...)
TP monitor
administrative functions
and callbacks to install, start, and
schedule a resource manager
response
invocation
callbacks
(depends on application)
Save
Prepare
Commit
UNDO
REDO
Checkpoint
Transaction
Manager
functions
callbacks
Identify
SaveWork
RollbackWork
Join
StatusTransaction
Leave
Resume
8Gray & Reuter: Resource Manager
WW tour: The Resource manager view
BooleanSavepoint(LSN *); /* invoked at tran Save_Work(). Returns RM vote */
BooleanPrepare(LSN *); /* invoked at phase_1. Return vote on commit */
void Commit(); /* called at commit ¯2 */
void Abort(); /* called at failed commit ¯2 or abort */
void UNDO(LSN); /* Undo the log record with this LSN */
void REDO(LSN); /* Redo the log record with this LSN */
BooleanUNDO_Savepoint(LSN);/* Vote TRUE if can return to savepoint */
void REDO_Savepoint(LSN);/* Redo a savepoint. */
void TM_Startup(LSN); /* TM restarting. Passes RM ckpt LSN */
LSN Checkpoint(LSN * low_water); /* TM checkpointing, Return RM ckpt LSN,
set low water LSN */
Boolean Join_Work(RMID, TRID); /* Become part of a transaction */
9Gray & Reuter: Resource Manager
WW Tour: The Transaction Manager
Transaction rollback.
coordinates transaction rollback to a savepoint or abort
rollbacks can be initiated by any participant.
Resource manager restart.
If an RM fails and restarts, TM presents checkpoint anchor & RM undo/redo log
System restart.
TM drives local RM recovery (like RM restart)
TM resolves any in-doubt distributed transactions
Media recovery.
TM helps RM reconstruct damaged objects by providing
archive copies of object + the log of object since archived.
Node restart.
Transaction commit among independent TMs when a TM fails.
10Gray & Reuter: Resource Manager
WW Tour: When a Transaction Aborts
At transaction rollback
TM drives undo of each RM joined to the transaction
Can be to savepoint 0 (abort) or partial rollback.
T ra n s a c tio n
M a n a g e r
R e a d T ra n s a c tio n 's
L o g R e c o rd s &
C a ll U n d o
W rite A b o rt R e c o rd
in L o g
T ra n s a c tio n
C a llb a c k s
W o rk R e q u e s ts
N o rm a l
F u n c ito n s
L o c k R e q u e s ts
L o g R e c o rd s
W o rk R e q u e s ts
L o c k
M a n a g e r
tra n s id
L o g
M a n a g e r
A p p l i c a t i o n
B e g in _ W o rk ()
R o llb a c k _ W o rk ()
U n d o (lo g re c o rd )
A b o rte d (tra n s id )
J o in _ W o rk
R e s o u rc e
M a n a g e r
11Gray & Reuter: Resource Manager
WW tour: the Transaction Manager
at Restart/Recovery
At restart, TM reading the log drives RM recovery.
Single log scan.
Single resolver of transactions.
Multiple logs possible, but more complex/more work.
Transaction
Manager
Find Checkpoint
Read log forward
Redo each op
At end,
Undo Soft
Savepoints &
Transactions
Undo (log record)
Log RecordsLog
Manager
Undo (log record)
Undo(log record)
Resource
Manager
Redo (log record)
Redo (log record)
Redo (log record)
Redo (log record)
Redo (log record)
Redo(log record)
Log Records
12Gray & Reuter: Resource Manager
End of Whirl-Wind TourEnd of Whirl-Wind Tour
13Gray & Reuter: Resource Manager
Resource Manager Concepts:
Undo Redo Protocol
DO
Old State New State
DO-UNDO- REDO Protocol
log record
New State
Old State
UNDO
log record
Old State
log record
New State
REDO
14Gray & Reuter: Resource Manager
Resource Manager Concepts:
Transaction UNDO Protocol
declare cursor for transaction_log
select rmid, lsn /* a cursor on the transaction's log */
from log /* it returns the resource manager name */
where trid = :trid /* and record id (log sequence number) */
descending lsn; /* and returns records in LIFO order */
void transaction_undo(TRID trid) /* Undo the specified transaction. */
{ int sqlcode; /* event variables set by sql */
open cursor transaction_log; /* open an sql cursor on the trans log */
while (TRUE) /* scan trans log backwards & undo each*/
{ /* fetch the next most recent log rec */
fetch transaction_log into :rmid, :lsn; /* */
if (sqlcode != 0) break; /* if no more, trans is undone, end loop*/
rmid.undo(lsn); /* tell RM to undo that record */
} /* tell RM to undo that record */
close cursor transaction_log; /* Undo scan is complete, close cursor */
}; /* return to caller */
• If UNDO to savepoint , the UNDO stops at desired savepoint
15Gray & Reuter: Resource Manager
Resource Manager Concepts:
Restart REDO Protocol
Note: REDO forwards, UNDO backwards
void log_redo(void) /* */
{declare cursor for the_log /* declare cursor from log start forward */
select rmid, lsn /* gets RM id and log record id (lsn) */
from log /* of all log records. */
ascending lsn; /* in FIFO order */
open cursor the_log; /* open an sql cursor on the log table */
while (TRUE) /* Scan log forward& redo each record. */
{ fetch the_log into :rmid, :lsn; /* fetch the next log record */
if (sqlcode != 0) break; /* if no more, then all redone, end loop */
rmid.redo(lsn);} /* tell RM to redo that record */
close cursor the_log; /* Redo scan complete, close cursor */
}; /* return to caller */
16Gray & Reuter: Resource Manager
Idempotence
F(F(X)) == F(X): Needed in case restart fails (and restarts)
Redo(Redo(old_state,log), log) = Redo(new_state,log) = new_state
Undo(Undo(new_state,log), log) = Undo(old_state,log) = old_state
Old State
New State
log record
log record
undo
redo
17Gray & Reuter: Resource Manager
Testable State: Can Tell If It Happened.
IF operation not idempotent AND state not testable
THEN recovery is impossible
ELSE for F in {UNDO, REDO}:
not testable: WHILE (! ACK) F(F(X))
testable: WHILE ( not desired state) {F(x)}
New State
Old State
test
Unknown
State
18Gray & Reuter: Resource Manager
Real Operations: Can Not Be Undone
Defer operations until commit is assured.
Perform as part of Phase 2 of commit
If must undo for some reason,
generate compensation log record
to be processed by some higher authority.
UNDO
REDO
New State
log record
Old State
DO
Old State
log record
Commit
New State
log record
Old State
Old State Old State
log record Compensation log record
Old State
19Gray & Reuter: Resource Manager
Example: Communications Session RM
Ops are idempotent (sequence numbers)
and testable (sequence numbers)
log cancellation message
return to savepoint
acknowledge
if not duplicate
<normal DO processing>
else just acknowledge.
Sender Receiver
DO
UNDO
REDO
COMMIT
log message & seqno
send
send cancellation
(generates log record)
resend message
send any deferred (real)
messages
establish savepoint.
log message & seqno
acknowledge
Session And Message Recovery Actions
do it
20Gray & Reuter: Resource Manager
Kinds of Logging
Physical:
Keep old and new value of container (page, file,...)
Pro: Simple
Allows recovery of physical object (e.g. broken page)
Con: Generates LOTS of log data
Logical:
Keep call params such that you can compute F(x), F
-1
(x)
Pro: Sounds simple
Compact log.
Con: Doesn't work (wrong failure model).
Operations do not fail cleanly.
21Gray & Reuter: Resource Manager
Sample Physical LOG RECORD
Ordinary sequential insert is OK.
Update of sorted (B-tree) page:
update LSN
update page space map
update pointer to record
insert record at correct spot (move 1/2 the others)
Essentially writes whole page (old and new).
16KB log records for 100-byte updates.
struct compressed_log_record_for_page_update /* */
{ int opcode; /* opcode will say compressed page update*/
filename fname; /* name of file that was updated */
long pageno; /* page that was updated */
long offset; /* offset within page that was updated */
long length; /* length of field that was updated */
char old_value[length]; /* old value of field */
char new_value[length]; /* new value of field */
}; /* */
22Gray & Reuter: Resource Manager
Sample Physical LOG RECORD
Very compact.
Implies page update(s) for record (may be many pages long).
Implies index updates (many be many indices on base table)
struct logical_log_record_for_insert /* */
{ int opcode; /* opcode will says insert */
filename fname; /* name of file that was updated */
long length; /* length of record that was updated */
char record[length]; /* value record */
}; /* */
23Gray & Reuter: Resource Manager
The trouble with Logical Logging
Logical logging needs to start UNDO/REDO with an action-consistent state.
No half completed operations.
for example: insert (table, record)
ALL or NONE of the indices should be updated
when logical UNDO/REDO is invoked.
Problem:
Failure model is Page & Message action consistency
(Lampson /Sturgis model of Chapter 3).
Actions can fail due to:
Logic: e.g. duplicate key.
Limit: ran out of space
Contention: deadlock
Media: broken page or session
System: computer failure/restart
24Gray & Reuter: Resource Manager
Making Logical Logging Work: Shadows
Keep old copy of each page
Reset page to old copy at abort (no undo log)
Discard old copy at commit.
Handles all online failures due to:
Logic: e.g. duplicate key.
Limit: ran out of space
Contention: deadlock
Problem: forces page locking, only one updater per page.
What about restart?
Need to atomically write out all changed pages.
25Gray & Reuter: Resource Manager
Making Logical Logging Work: Shadows
Perform same shadow trick at disc level.
Keep shadow copy of old pages.
Write out new pages.
In one careful write, write out new page root.
Makes update atomic
Free Space
Bit MapDirectory
Free Space
Bit MapDirectory
Data
Old New
A Shadow Update
A B C A BC
26Gray & Reuter: Resource Manager
Shadows
Pro: Simple
Not such a bad deal with non-volatile ram
Con: page locking
extra space
extra overhead (for page maps)
extra IO
declusters sequential data
27Gray & Reuter: Resource Manager
Compromise Physio-Logical Logging
Physio-Logical Logging
Physical to a "page" (physical container)
Logical within a "page".
Keep old and new value of container (page, file,...)
Pro: Simple
Allows recovery of physical object (e.g. broken page)
Con: Generates LOTS of log data
28Gray & Reuter: Resource Manager
Logical vs Physio-logical Logging
Insert recordrintotableA
TableA
IndexB
IndexC
insert, A,r
Logical logrecord
TableA
IndexB
IndexC
insert, A,page508,r
Physiological logrecords
insert, B,page72,s
insert, C,page94,t
Note: physical log records would be bigger for sorted pages.
29Gray & Reuter: Resource Manager
Physiological Logging Rules
Complex operations are a sequence of simple operations on pages and
messages.
Each operation is constructed as a mini-transaction:
lock the object in exclusive mode
transform the object
generate an UNDO-REDO log record
record log LSN in object
unlock the object.
Action Consistent Object:
When object semaphore free, no ops in progress.
Log-Consistency:
contains log records of all complete page/msg actions.
30Gray & Reuter: Resource Manager
Physiological Logging Rules
Online Operation - Only Need the Fix Rule
Each operation is structured as a mini-transaction.
Each operation generates an UNDO record.
No page operation fails with the semaphore set.
(exception handler must clean up state
and UNFIX any pages).
Then Rollback can be
physical to a page/session/container and
logical within page/session/container.
31Gray & Reuter: Resource Manager
Physiological Logging Rules
Restart Operation - Need WAL and F@C
Need Page-Action consistent disc state.
Pages are action consistent.
Committed actions can be redone from log.
Uncommitted actions can be undone from log.
WAL: Write Ahead Log
Write undo/redo log records before overwriting disc page
Only write action-consistent pages
Force-Log-At-Commit
Make transaction log records durable at commit.
32Gray & Reuter: Resource Manager
Physiological Logging Rules
WAL and F@C
WAL: Write Ahead Log
write page:
get page semaphore
copy page
give page semaphore /* avoids holding semaphore during IO */
Force_log(Page(LSN)) /*WAL logic, probably already flushed*/
Write copy to disc.
WAL gives idempotence and testability.
Force-Log-At-Commit
At commit phase 1:
Force_log(transaction.max_lsn)
33Gray & Reuter: Resource Manager
WAL & F@C in PicturesWAL & F@C in Pictures
VVlsn
Volatile Page
Versions
Volatile Log
Records
VLlsn
PVlsn
Persistent Page
Versions
Durable Log
Records
DLlsn
Time
online:VVlsn = VLlsn
restart: DLlsn <= VVlsn
PVlsn <= DLlsn
Commit:
commit_lsn <= DLlsn
At restart all volatile memory is reset and must be
reconstructed from persistent memory.
restart:
PVlsn <= DLlsn
commit_lsn <= DLlsn
PVlsn
DLlsn
FIX, WAL and F@C assure these assertions
34Gray & Reuter: Resource Manager
The One Bit Resource Manager
Manages an array of transactional bits (the free space bit map).
i = get_bit(); /* gets a free bit and sets it */
give_bit(i); /* returns a free bit (when transaction commits) */
35Gray & Reuter: Resource Manager
The Bitmap and Its Log Records
The Data Structure
struct { /* layout of the one-bit RM data structure */
LSN lsn; /* page LSN for WAL protocol */
xsemaphore sem; /* semaphore regulates access to the page */
Boolean bit[BITS]; /* page.bit[i] = TRUE => bit[i] is free */
} page; /* allocates the page structure */
The Log Records
struct /* log record format for the one-bit RM */
{ int index; /* index of bit that was updated */
Boolean value; /* new value of bit[index] */
} log_rec; /* log record used by the one-bit RM */
const int rec_size = sizeof(log_rec); /*size of the log record body. */
36Gray & Reuter: Resource Manager
Page and Log Consistency for 1-Bit RM
Data dirty if reflects an uncommitted transaction update
Otherwise, data is clean.
Page Consistency:
• No clean free bit has been given to any transaction.
• Every clean busy bit was given to exactly one transaction.
• Dirty bits locked in X mode by updating transactions .
• The page.lsn reflects most recent log record for page.
Log Consistency:
• Log contains a record for every completed
mini-transaction update to the page.
37Gray & Reuter: Resource Manager
give_bit()
get_bit() & give_bit(i) temporarily violate page consistency.
Mini-transaction holds semaphore while violating consistency.
Makes page & log mutually consistent before releasing sem.
=> each mini-transaction observes a consistent page state.
void give_bit(int i) /* free a bit */
{ if (LOCK_GRANTED==lock(i,LOCK_X,LOCK_LONG,0)) /* Lock bit */
{ Xsem_get(&page.sem); /* get page sem */
page.bit[i] = TRUE; /* free the bit */
log_rec.index = i; /* generate log rec */
log_rec.value = TRUE; /*saying bit is free */
page.lsn = log_insert(log_rec,rec_size); /*write log rec&update lsn */
Xsem_give(&page.sem);} /* page consistent */
else /* if lock failed, caller doesn't own bit,
*/
Abort_Work(); /* in that case abort caller's trans */
return; }; /* */
38Gray & Reuter: Resource Manager
get_bit()
int get_bit(void) /* allocate a bit to and returns bit index */
{ int i; /* loop variable */
Xsem_get(&page.sem); /* get the page semaphore */
for ( i = 0; i<BITS; i++); /* loop looking for a free bit */
{if (page.bit[i]) /* if bit is free, may be dirty (so locked)
*/
{if (LOCK_GRANTED =lock(i,LOCK_X,LOCK_LONG,0));/* lock bit */
{ page.bit[i] =FALSE; /* got lock on it, so it was free */
log_rec.value = FALSE; /* generate log rec describing update */
log_rec.index = i; /* */
page.lsn = log_insert(log_rec,rec_size); /* write log rec&update lsn */
Xsem_give(&page.sem); /* page now consistent, give up sem */
return i; } /* return to caller */
}; /* else lock bounce so bit dirty */
}; /* try next free bit, */
Xsem_give(&page.sem); /* if no free bits, give up semaphore */
Abort_Work(); /* abort transaction
*/
return -1;}; /* returns -1 if no bits are available. */
39Gray & Reuter: Resource Manager
Compensation Logging
Undo may generate a log record recording undo step
Makes Page LSN monotonic
Similar technique was used for Communication Manager
(session sequence number was monotonic)
New State Logical Old State
UNDO
log record com pensation log record
40Gray & Reuter: Resource Manager
1-bit RM UNDO Callback
void undo(LSN lsn) /* undo a one-bit RM operation */
{ int i; /* bit index */
Boolean value; /* old bit value from log rec to be undone*/
log_rec_header header; /* buffer to hold log record header */
rec_size = log_read_lsn(lsn,header,0,log_rec,big); /* read log rec */
Xsem_get(&page.sem); /* get the page semaphore */
i = log_rec.index; /* get bit index from log record */
value = ! log_rec.value; /* get complement of new bit value */
page.bit[i] = value; /* update bit to old value */
log_rec.value= value; /* make a compensation log record */
page.lsn = log_insert(log_rec,rec_size); /* log it and bump page lsn */
Xsem_give(&page.sem); /* free the page semaphore */
return; } /* */
41Gray & Reuter: Resource Manager
1-bit RM Checkpoint Callback
LSN checkpoint(LSN * low_water) /* copy 1-page RM state to persistent store*/
{ Xsem_get(&page.sem); /* get the page semaphore */
*low_water = log_flush(page.lsn); /* WAL force up to page lsn, and */
/* set low water mark */
write(file,page,0,sizeof(page)); /* write page to persistent memory */
Xsem_give(&page.sem); /* give page semaphore */
return NULLlsn; } /* return checkpoint lsn (none needed) */
42Gray & Reuter: Resource Manager
1-bit RM REDO Callback
void redo( LSN lsn) /* redo an free space operation */
{ int i; /* bit index */
Boolean value; /* new bit value from log rec to be redone*/
log_rec_header header; /* buffer to hold log record header */
rec_size = log_read_lsn(lsn,header,0,log_rec,big); /* read log record */
i = log_rec.index; /* Get bit index */
lock(i,LOCK_X,LOCK_LONG,0); /* get lock on the bit (often not needed)*/
Xsem_get(&page.sem); /* get the page semaphore */
if (page.lsn < lsn) /* if bit version older than log record */
{ value= log_rec.value; /* then redo the op. get new bit value */
page.bit[i] = value; /* apply new bit value to bit */
page.lsn = lsn; } /* advance the page lsn */
Xsem_give(&page.sem); /* free the page semaphore */
return; }; /* */
43Gray & Reuter: Resource Manager
1-BIT Rm Noise Callbacks
Boolean prepare(LSN * lsn) /* 1-bit RM has no phase 1 work */
{*lsn = NULLlsn; return TRUE ;}; /* */
void Commit(void ) /* Commit release locks & */
{ unlock_class(LOCK_LONG, TRUE, MyRMID()); }; /* return */
void Abort(void ) /* Abort release all locks & */
{ unlock_class(LOCK_LONG, TRUE, MyRMID()); }; /* return */
Boolean savepoint((LSN * lsn) /* no work to do at savepoint */
{*lsn = NULLlsn; return TRUE ;}; /* */
void UNDO_savepoint(LSN lsn) /* rollback work or abort transaction */
{if (savepoint == 0) /* if at savepoint zero (abort) */
unlock_class(LOCK_LONG, TRUE, MyRMID()); /* release all locks */
}; /* */
44Gray & Reuter: Resource Manager
Summary
Model: Complex actions are a page/message action sequence.
LSN: Each page carries an LSN and a semaphore.
ReadFix: Read acts semaphore in shared mode.
WriteFix: Update actions get semaphore in exclusive mode,
generate one or more log records covering the page,
advance the page LSN to match highest LSN
give semaphore
WAL: log_flush(page.LSN) before overwriting persistent page
F@C: force all log records up to the commit LSN at commit
Compensation Logging: Invalidate undone log record with a
compensating log record.
Idempotence via LSN: page LSN makes REDO idempotent
45Gray & Reuter: Resource Manager
Two Phase Commit
Getting two or more logs to agree
Getting two or more RMs to agree
Atomically and Durably
Even in case one of them fails and restarts.
The TM phases
Prepare. Invoke each joined RM asking for its vote.
Decide. If all vote yes, durably write commit log record.
Commit. Invoke each joined RM, telling it commit
decision.
Complete. Write commit completion when all RM ACK.
46Gray & Reuter: Resource Manager
Centralized Case of Two Phase Commit
Each participant: (TM &RM) goes through a
sequence of states
These generate log records
Null Active
Aborting Aborted
Prepared Committing Committed
47Gray & Reuter: Resource Manager
ExamplesExamples
Committed Aborted
begin begin
DO rm1 DO rm1
DO rm2 DO rm2
DO rm2 DO rm2
prepare rm2 {locks} UNDO rm2
commit { rm1, rm2} UNDO rm2
complete UNDO rm1
UNDO begin { rm1, rm2}
complete
48Gray & Reuter: Resource Manager
Transitions in Case of Restart
Null Active
Aborting Aborted
Prepared Committing Committed
Active state not persistent, others are persistent
For both TM and RM.
Log records make them persistent (redo)
TM tries to drive states to the right. (to committed, aborted)
49Gray & Reuter: Resource Manager
Successful two phase commit
Message/Call flow from TM to each RM joined to transaction
If TM and RM share the same log,
the RM FORCE can piggyback on the TM FORCE
One IO to commit a transaction (less if commit is grouped)
Prepare
LocalPrepare
WritePrepareRecord
InLog(force)
yes
LocalPrepare
(lazy)
WriteCommit
RecordInLog
(force)
Commit
Ack
LocalCommitWork
WriteCompletionRecord
InLog(lazy)
Ackwhen durable.
Coordinator Participant
WriteCompletion
RecordInLog
(lazy)
State
Active
Prepared
Committing
LocalCommit
Work
(lazy)
Committed
State
Active
Prepared
Committing
Committed
50Gray & Reuter: Resource Manager
Abort Two Phase Commit
If RM sends "NO" or no response (timeout), TM starts abort.
Calls UNDO of each trans log record
May stop at a savepoint.
At begin_trans it calls ABORT() callback of each joined RM
51Gray & Reuter: Resource Manager
Distributed two phase commit
Tracking joined TMs -- the communications manager helps
Much as TRPC helps in the local case.
Root TM owes a Prepare/Commit/Abort message to each joined TM.
Joined TM does "local" commit.
call
first time?
Transaction
Manager A
tridis
outgoingtoB
Communications
Manager
first time?
Transaction
Manager
tridis
incomingfromA
Communications
ManagerSession calleetrid, data
trid, data
52Gray & Reuter: Resource Manager
Full Transaction State Diagram
Next section explains how these states are implemented.
null
persistent save point n
=save point 0
Begun
=save point 1
save point n active
prepared
committing
committed
aborting
aborted
Durable
States
Persistent
States
Volatile
States
livestates
completestates
53Gray & Reuter: Resource Manager
Summary of Resource Manager Concepts
DO/UNDO/REDO
Idempotent, Testable, Real operations
Logical vs Physical logging
Shadows to make logical logging work
Physiological logging
Fix, WAL, Force-at-commit
Page/Message/Log consistency
RM callbacks (the 1-bit resource manager)
Join, Prepare, Commit, Abort, UNDO, REDO, ....
Restart REDO/UNDO
Two phase commit (RM story is simple).

More Related Content

Similar to 10b rm

Spring Transaction Management
Spring Transaction ManagementSpring Transaction Management
Spring Transaction ManagementYe Win
 
3_Register in COA.ppt
3_Register in COA.ppt3_Register in COA.ppt
3_Register in COA.ppttommychauhan
 
Android Radio Layer Interface
Android Radio Layer InterfaceAndroid Radio Layer Interface
Android Radio Layer InterfaceChun-Yu Wang
 
Ruslan Platonov - Transactions
Ruslan Platonov - TransactionsRuslan Platonov - Transactions
Ruslan Platonov - TransactionsDmitry Buzdin
 
C lecture 4 nested loops and jumping statements slideshare
C lecture 4 nested loops and jumping statements slideshareC lecture 4 nested loops and jumping statements slideshare
C lecture 4 nested loops and jumping statements slideshareGagan Deep
 
MongoDB WiredTiger Internals: Journey To Transactions
MongoDB WiredTiger Internals: Journey To TransactionsMongoDB WiredTiger Internals: Journey To Transactions
MongoDB WiredTiger Internals: Journey To TransactionsMydbops
 
MongoDB WiredTiger Internals: Journey To Transactions
  MongoDB WiredTiger Internals: Journey To Transactions  MongoDB WiredTiger Internals: Journey To Transactions
MongoDB WiredTiger Internals: Journey To TransactionsM Malai
 
php[world] 2016 - You Don’t Need Node.js - Async Programming in PHP
php[world] 2016 - You Don’t Need Node.js - Async Programming in PHPphp[world] 2016 - You Don’t Need Node.js - Async Programming in PHP
php[world] 2016 - You Don’t Need Node.js - Async Programming in PHPAdam Englander
 
Code GPU with CUDA - Identifying performance limiters
Code GPU with CUDA - Identifying performance limitersCode GPU with CUDA - Identifying performance limiters
Code GPU with CUDA - Identifying performance limitersMarina Kolpakova
 
Open source report writing tools for IBM i Vienna 2012
Open source report writing tools for IBM i  Vienna 2012Open source report writing tools for IBM i  Vienna 2012
Open source report writing tools for IBM i Vienna 2012COMMON Europe
 
[WSO2Con EU 2018] Managing Transactions in Your Microservices Architecture
[WSO2Con EU 2018] Managing Transactions in Your Microservices Architecture [WSO2Con EU 2018] Managing Transactions in Your Microservices Architecture
[WSO2Con EU 2018] Managing Transactions in Your Microservices Architecture WSO2
 
No more Big Data Hacking—Time for a Complete ETL Solution with Oracle Data In...
No more Big Data Hacking—Time for a Complete ETL Solution with Oracle Data In...No more Big Data Hacking—Time for a Complete ETL Solution with Oracle Data In...
No more Big Data Hacking—Time for a Complete ETL Solution with Oracle Data In...Jérôme Françoisse
 
Loging changes in data with lumberjack
Loging changes in data with lumberjackLoging changes in data with lumberjack
Loging changes in data with lumberjackMark Van Der Loo
 

Similar to 10b rm (20)

Registers in coa
Registers in coaRegisters in coa
Registers in coa
 
Spring Transaction Management
Spring Transaction ManagementSpring Transaction Management
Spring Transaction Management
 
1230 Rtf Final
1230 Rtf Final1230 Rtf Final
1230 Rtf Final
 
13 tm adv
13 tm adv13 tm adv
13 tm adv
 
10a log
10a log10a log
10a log
 
Managing transactions 11g release 1 (10.3
Managing transactions   11g release 1 (10.3Managing transactions   11g release 1 (10.3
Managing transactions 11g release 1 (10.3
 
3_Register in COA.ppt
3_Register in COA.ppt3_Register in COA.ppt
3_Register in COA.ppt
 
Android Radio Layer Interface
Android Radio Layer InterfaceAndroid Radio Layer Interface
Android Radio Layer Interface
 
Ruslan Platonov - Transactions
Ruslan Platonov - TransactionsRuslan Platonov - Transactions
Ruslan Platonov - Transactions
 
Autonomous transaction
Autonomous transactionAutonomous transaction
Autonomous transaction
 
C lecture 4 nested loops and jumping statements slideshare
C lecture 4 nested loops and jumping statements slideshareC lecture 4 nested loops and jumping statements slideshare
C lecture 4 nested loops and jumping statements slideshare
 
MongoDB WiredTiger Internals: Journey To Transactions
MongoDB WiredTiger Internals: Journey To TransactionsMongoDB WiredTiger Internals: Journey To Transactions
MongoDB WiredTiger Internals: Journey To Transactions
 
MongoDB WiredTiger Internals: Journey To Transactions
  MongoDB WiredTiger Internals: Journey To Transactions  MongoDB WiredTiger Internals: Journey To Transactions
MongoDB WiredTiger Internals: Journey To Transactions
 
php[world] 2016 - You Don’t Need Node.js - Async Programming in PHP
php[world] 2016 - You Don’t Need Node.js - Async Programming in PHPphp[world] 2016 - You Don’t Need Node.js - Async Programming in PHP
php[world] 2016 - You Don’t Need Node.js - Async Programming in PHP
 
Code GPU with CUDA - Identifying performance limiters
Code GPU with CUDA - Identifying performance limitersCode GPU with CUDA - Identifying performance limiters
Code GPU with CUDA - Identifying performance limiters
 
Open source report writing tools for IBM i Vienna 2012
Open source report writing tools for IBM i  Vienna 2012Open source report writing tools for IBM i  Vienna 2012
Open source report writing tools for IBM i Vienna 2012
 
[WSO2Con EU 2018] Managing Transactions in Your Microservices Architecture
[WSO2Con EU 2018] Managing Transactions in Your Microservices Architecture [WSO2Con EU 2018] Managing Transactions in Your Microservices Architecture
[WSO2Con EU 2018] Managing Transactions in Your Microservices Architecture
 
S emb t12-os
S emb t12-osS emb t12-os
S emb t12-os
 
No more Big Data Hacking—Time for a Complete ETL Solution with Oracle Data In...
No more Big Data Hacking—Time for a Complete ETL Solution with Oracle Data In...No more Big Data Hacking—Time for a Complete ETL Solution with Oracle Data In...
No more Big Data Hacking—Time for a Complete ETL Solution with Oracle Data In...
 
Loging changes in data with lumberjack
Loging changes in data with lumberjackLoging changes in data with lumberjack
Loging changes in data with lumberjack
 

More from ashish61_scs

More from ashish61_scs (20)

7 concurrency controltwo
7 concurrency controltwo7 concurrency controltwo
7 concurrency controltwo
 
Transactions
TransactionsTransactions
Transactions
 
22 levine
22 levine22 levine
22 levine
 
21 domino mohan-1
21 domino mohan-121 domino mohan-1
21 domino mohan-1
 
20 access paths
20 access paths20 access paths
20 access paths
 
19 structured files
19 structured files19 structured files
19 structured files
 
18 philbe replication stanford99
18 philbe replication stanford9918 philbe replication stanford99
18 philbe replication stanford99
 
16 greg hope_com_wics
16 greg hope_com_wics16 greg hope_com_wics
16 greg hope_com_wics
 
15 bufferand records
15 bufferand records15 bufferand records
15 bufferand records
 
14 turing wics
14 turing wics14 turing wics
14 turing wics
 
14 scaleabilty wics
14 scaleabilty wics14 scaleabilty wics
14 scaleabilty wics
 
09 workflow
09 workflow09 workflow
09 workflow
 
06 07 lock
06 07 lock06 07 lock
06 07 lock
 
05 tp mon_orbs
05 tp mon_orbs05 tp mon_orbs
05 tp mon_orbs
 
04 transaction models
04 transaction models04 transaction models
04 transaction models
 
03 fault model
03 fault model03 fault model
03 fault model
 
02 fault tolerance
02 fault tolerance02 fault tolerance
02 fault tolerance
 
01 whirlwind tour
01 whirlwind tour01 whirlwind tour
01 whirlwind tour
 
Solution5.2012
Solution5.2012Solution5.2012
Solution5.2012
 
Solution6.2012
Solution6.2012Solution6.2012
Solution6.2012
 

Recently uploaded

Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 

Recently uploaded (20)

Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 

10b rm

  • 1. 1Gray & Reuter: Resource Manager Resource ManagersResource Managers 9:00 11:00 1:30 3:30 7:00 Overview Faults Tolerance T Models Party TP mons Lock Theory Lock Techniq Queues Workflow Log ResMgr CICS & Inet Adv TM Cyberbrick Files &Buffers COM+ Corba Replication Party B-tree Access Paths Groupware Benchmark Mon Tue Wed Thur Fri Jim GrayJim Gray Microsoft, Gray @ Microsoft.comMicrosoft, Gray @ Microsoft.com Andreas ReuterAndreas Reuter International University, Andreas.Reuter@i-u.deInternational University, Andreas.Reuter@i-u.de
  • 2. 2Gray & Reuter: Resource Manager Whirlwind Tour: The Actors Resource managers – provide ACID objects (transactional objects)provide ACID objects (transactional objects) – Use log manager to record changesUse log manager to record changes – Use transaction manager to coordinate multi-RM changesUse transaction manager to coordinate multi-RM changes – Use communication manager to make transactional RPCsUse communication manager to make transactional RPCs Transaction Manager Log Manager Log Objects Resource Managers Objects Resource Managers Volatile Storage Durable Storage Volatile Storage Durable Storage Communication Manager Transaction Manager Log Manager Communication Manager Log
  • 3. 3Gray & Reuter: Resource Manager Whirlwind Tour: the Application Verbs TRID Begin_Work(context *); /* begin a transaction */ Boolean Commit_Work(context *); /* commit the transaction */ void Abort_Work(void); /* rollback to savepoint zero */ savepoint Save_Work(context *); /* establish a savepoint */ savepoint Rollback_Work(savepoint); /*return to savept (savept 0 = abort)*/ Boolean Prepare_Work(context *); /* put transaction in prepared state */ context Read_Context(void); /* return current savepoint context */ TRID Chain_Work(context *); /* end current and start next trans */ TRID My_Trid(void); /* return current transaction identifier*/ TRID Leave_Transaction(void); /*set process trid null, return current id*/ Boolean Resume_Transaction(TRID); /* set process trid to desired trid */ enum tran_status { ACTIVE , PREPARED , ABORTING , COMMITTING , ABORTED , COMMITTED}; tran_status Status_Transaction(TRID); /* transaction identifier status */
  • 4. 4Gray & Reuter: Resource Manager Whirlwind Tour Types Of Transaction Executions Shaded stuff is “undone”Shaded stuff is “undone” Save Persistent Begin Action Action Save Action Save Action Action Action Save Action Action Commit Commit A Simple Commit A Simple Abort Begin Action Action Save Action Save Action Action Action Save Action Rollback Action Action Action Save Action A Partial Rollback Begin Action Action Save Action Save Action Action Action Save Action Rollback A Persistent Transaction Surviving A System Restart Begin Action Action Action Save Action Restart Action Save Action Commit
  • 5. 5Gray & Reuter: Resource Manager Whirlwind Tour: the TRID Flow Call graph: who calls whom. TRIDs flow on all such calls. Application is typically root. RM can be an application (use a transactional RM to store state) Application Application Servers Resource Managers Resource Managers Transaction Application Servers
  • 6. 6Gray & Reuter: Resource Manager Whirlwind tour Normal (no failure) Transaction Execution TM generates the TRID at Begin_Work(). Coordinates Commit, RM joins work, generates log records, allows commit T r a n s a c t i o n M a n a g e r W r i t e C o m m i t L o g R e c o r d & F o r c e L o g C o m m i t P h a s e 1 ? Y e s / N o C o m m i t P h a s e 2 a c k T r a n s a c t i o n C a l l b a c k s F u n c t i o n s W o r k R e q u e s t s R e s o u r c e M a n a g e r N o r m a l F u n c i t o n s L o c k R e q u e s t s L o g R e c o r d s W o r k R e q u e s t s L o c k M a n a g e r t r a n s i d L o g M a n a g e r A p p lic a t io n B e g i n _ W o r k ( ) C o m m i t _ W o r k ( ) J o i n _ W o r k
  • 7. 7Gray & Reuter: Resource Manager WW tour: The Resource Manger view Resource Manager resource manager's own service interface rmCall(...) transaction management other resource managers rmCall(...) TP monitor administrative functions and callbacks to install, start, and schedule a resource manager response invocation callbacks (depends on application) Save Prepare Commit UNDO REDO Checkpoint Transaction Manager functions callbacks Identify SaveWork RollbackWork Join StatusTransaction Leave Resume
  • 8. 8Gray & Reuter: Resource Manager WW tour: The Resource manager view BooleanSavepoint(LSN *); /* invoked at tran Save_Work(). Returns RM vote */ BooleanPrepare(LSN *); /* invoked at phase_1. Return vote on commit */ void Commit(); /* called at commit ¯2 */ void Abort(); /* called at failed commit ¯2 or abort */ void UNDO(LSN); /* Undo the log record with this LSN */ void REDO(LSN); /* Redo the log record with this LSN */ BooleanUNDO_Savepoint(LSN);/* Vote TRUE if can return to savepoint */ void REDO_Savepoint(LSN);/* Redo a savepoint. */ void TM_Startup(LSN); /* TM restarting. Passes RM ckpt LSN */ LSN Checkpoint(LSN * low_water); /* TM checkpointing, Return RM ckpt LSN, set low water LSN */ Boolean Join_Work(RMID, TRID); /* Become part of a transaction */
  • 9. 9Gray & Reuter: Resource Manager WW Tour: The Transaction Manager Transaction rollback. coordinates transaction rollback to a savepoint or abort rollbacks can be initiated by any participant. Resource manager restart. If an RM fails and restarts, TM presents checkpoint anchor & RM undo/redo log System restart. TM drives local RM recovery (like RM restart) TM resolves any in-doubt distributed transactions Media recovery. TM helps RM reconstruct damaged objects by providing archive copies of object + the log of object since archived. Node restart. Transaction commit among independent TMs when a TM fails.
  • 10. 10Gray & Reuter: Resource Manager WW Tour: When a Transaction Aborts At transaction rollback TM drives undo of each RM joined to the transaction Can be to savepoint 0 (abort) or partial rollback. T ra n s a c tio n M a n a g e r R e a d T ra n s a c tio n 's L o g R e c o rd s & C a ll U n d o W rite A b o rt R e c o rd in L o g T ra n s a c tio n C a llb a c k s W o rk R e q u e s ts N o rm a l F u n c ito n s L o c k R e q u e s ts L o g R e c o rd s W o rk R e q u e s ts L o c k M a n a g e r tra n s id L o g M a n a g e r A p p l i c a t i o n B e g in _ W o rk () R o llb a c k _ W o rk () U n d o (lo g re c o rd ) A b o rte d (tra n s id ) J o in _ W o rk R e s o u rc e M a n a g e r
  • 11. 11Gray & Reuter: Resource Manager WW tour: the Transaction Manager at Restart/Recovery At restart, TM reading the log drives RM recovery. Single log scan. Single resolver of transactions. Multiple logs possible, but more complex/more work. Transaction Manager Find Checkpoint Read log forward Redo each op At end, Undo Soft Savepoints & Transactions Undo (log record) Log RecordsLog Manager Undo (log record) Undo(log record) Resource Manager Redo (log record) Redo (log record) Redo (log record) Redo (log record) Redo (log record) Redo(log record) Log Records
  • 12. 12Gray & Reuter: Resource Manager End of Whirl-Wind TourEnd of Whirl-Wind Tour
  • 13. 13Gray & Reuter: Resource Manager Resource Manager Concepts: Undo Redo Protocol DO Old State New State DO-UNDO- REDO Protocol log record New State Old State UNDO log record Old State log record New State REDO
  • 14. 14Gray & Reuter: Resource Manager Resource Manager Concepts: Transaction UNDO Protocol declare cursor for transaction_log select rmid, lsn /* a cursor on the transaction's log */ from log /* it returns the resource manager name */ where trid = :trid /* and record id (log sequence number) */ descending lsn; /* and returns records in LIFO order */ void transaction_undo(TRID trid) /* Undo the specified transaction. */ { int sqlcode; /* event variables set by sql */ open cursor transaction_log; /* open an sql cursor on the trans log */ while (TRUE) /* scan trans log backwards & undo each*/ { /* fetch the next most recent log rec */ fetch transaction_log into :rmid, :lsn; /* */ if (sqlcode != 0) break; /* if no more, trans is undone, end loop*/ rmid.undo(lsn); /* tell RM to undo that record */ } /* tell RM to undo that record */ close cursor transaction_log; /* Undo scan is complete, close cursor */ }; /* return to caller */ • If UNDO to savepoint , the UNDO stops at desired savepoint
  • 15. 15Gray & Reuter: Resource Manager Resource Manager Concepts: Restart REDO Protocol Note: REDO forwards, UNDO backwards void log_redo(void) /* */ {declare cursor for the_log /* declare cursor from log start forward */ select rmid, lsn /* gets RM id and log record id (lsn) */ from log /* of all log records. */ ascending lsn; /* in FIFO order */ open cursor the_log; /* open an sql cursor on the log table */ while (TRUE) /* Scan log forward& redo each record. */ { fetch the_log into :rmid, :lsn; /* fetch the next log record */ if (sqlcode != 0) break; /* if no more, then all redone, end loop */ rmid.redo(lsn);} /* tell RM to redo that record */ close cursor the_log; /* Redo scan complete, close cursor */ }; /* return to caller */
  • 16. 16Gray & Reuter: Resource Manager Idempotence F(F(X)) == F(X): Needed in case restart fails (and restarts) Redo(Redo(old_state,log), log) = Redo(new_state,log) = new_state Undo(Undo(new_state,log), log) = Undo(old_state,log) = old_state Old State New State log record log record undo redo
  • 17. 17Gray & Reuter: Resource Manager Testable State: Can Tell If It Happened. IF operation not idempotent AND state not testable THEN recovery is impossible ELSE for F in {UNDO, REDO}: not testable: WHILE (! ACK) F(F(X)) testable: WHILE ( not desired state) {F(x)} New State Old State test Unknown State
  • 18. 18Gray & Reuter: Resource Manager Real Operations: Can Not Be Undone Defer operations until commit is assured. Perform as part of Phase 2 of commit If must undo for some reason, generate compensation log record to be processed by some higher authority. UNDO REDO New State log record Old State DO Old State log record Commit New State log record Old State Old State Old State log record Compensation log record Old State
  • 19. 19Gray & Reuter: Resource Manager Example: Communications Session RM Ops are idempotent (sequence numbers) and testable (sequence numbers) log cancellation message return to savepoint acknowledge if not duplicate <normal DO processing> else just acknowledge. Sender Receiver DO UNDO REDO COMMIT log message & seqno send send cancellation (generates log record) resend message send any deferred (real) messages establish savepoint. log message & seqno acknowledge Session And Message Recovery Actions do it
  • 20. 20Gray & Reuter: Resource Manager Kinds of Logging Physical: Keep old and new value of container (page, file,...) Pro: Simple Allows recovery of physical object (e.g. broken page) Con: Generates LOTS of log data Logical: Keep call params such that you can compute F(x), F -1 (x) Pro: Sounds simple Compact log. Con: Doesn't work (wrong failure model). Operations do not fail cleanly.
  • 21. 21Gray & Reuter: Resource Manager Sample Physical LOG RECORD Ordinary sequential insert is OK. Update of sorted (B-tree) page: update LSN update page space map update pointer to record insert record at correct spot (move 1/2 the others) Essentially writes whole page (old and new). 16KB log records for 100-byte updates. struct compressed_log_record_for_page_update /* */ { int opcode; /* opcode will say compressed page update*/ filename fname; /* name of file that was updated */ long pageno; /* page that was updated */ long offset; /* offset within page that was updated */ long length; /* length of field that was updated */ char old_value[length]; /* old value of field */ char new_value[length]; /* new value of field */ }; /* */
  • 22. 22Gray & Reuter: Resource Manager Sample Physical LOG RECORD Very compact. Implies page update(s) for record (may be many pages long). Implies index updates (many be many indices on base table) struct logical_log_record_for_insert /* */ { int opcode; /* opcode will says insert */ filename fname; /* name of file that was updated */ long length; /* length of record that was updated */ char record[length]; /* value record */ }; /* */
  • 23. 23Gray & Reuter: Resource Manager The trouble with Logical Logging Logical logging needs to start UNDO/REDO with an action-consistent state. No half completed operations. for example: insert (table, record) ALL or NONE of the indices should be updated when logical UNDO/REDO is invoked. Problem: Failure model is Page & Message action consistency (Lampson /Sturgis model of Chapter 3). Actions can fail due to: Logic: e.g. duplicate key. Limit: ran out of space Contention: deadlock Media: broken page or session System: computer failure/restart
  • 24. 24Gray & Reuter: Resource Manager Making Logical Logging Work: Shadows Keep old copy of each page Reset page to old copy at abort (no undo log) Discard old copy at commit. Handles all online failures due to: Logic: e.g. duplicate key. Limit: ran out of space Contention: deadlock Problem: forces page locking, only one updater per page. What about restart? Need to atomically write out all changed pages.
  • 25. 25Gray & Reuter: Resource Manager Making Logical Logging Work: Shadows Perform same shadow trick at disc level. Keep shadow copy of old pages. Write out new pages. In one careful write, write out new page root. Makes update atomic Free Space Bit MapDirectory Free Space Bit MapDirectory Data Old New A Shadow Update A B C A BC
  • 26. 26Gray & Reuter: Resource Manager Shadows Pro: Simple Not such a bad deal with non-volatile ram Con: page locking extra space extra overhead (for page maps) extra IO declusters sequential data
  • 27. 27Gray & Reuter: Resource Manager Compromise Physio-Logical Logging Physio-Logical Logging Physical to a "page" (physical container) Logical within a "page". Keep old and new value of container (page, file,...) Pro: Simple Allows recovery of physical object (e.g. broken page) Con: Generates LOTS of log data
  • 28. 28Gray & Reuter: Resource Manager Logical vs Physio-logical Logging Insert recordrintotableA TableA IndexB IndexC insert, A,r Logical logrecord TableA IndexB IndexC insert, A,page508,r Physiological logrecords insert, B,page72,s insert, C,page94,t Note: physical log records would be bigger for sorted pages.
  • 29. 29Gray & Reuter: Resource Manager Physiological Logging Rules Complex operations are a sequence of simple operations on pages and messages. Each operation is constructed as a mini-transaction: lock the object in exclusive mode transform the object generate an UNDO-REDO log record record log LSN in object unlock the object. Action Consistent Object: When object semaphore free, no ops in progress. Log-Consistency: contains log records of all complete page/msg actions.
  • 30. 30Gray & Reuter: Resource Manager Physiological Logging Rules Online Operation - Only Need the Fix Rule Each operation is structured as a mini-transaction. Each operation generates an UNDO record. No page operation fails with the semaphore set. (exception handler must clean up state and UNFIX any pages). Then Rollback can be physical to a page/session/container and logical within page/session/container.
  • 31. 31Gray & Reuter: Resource Manager Physiological Logging Rules Restart Operation - Need WAL and F@C Need Page-Action consistent disc state. Pages are action consistent. Committed actions can be redone from log. Uncommitted actions can be undone from log. WAL: Write Ahead Log Write undo/redo log records before overwriting disc page Only write action-consistent pages Force-Log-At-Commit Make transaction log records durable at commit.
  • 32. 32Gray & Reuter: Resource Manager Physiological Logging Rules WAL and F@C WAL: Write Ahead Log write page: get page semaphore copy page give page semaphore /* avoids holding semaphore during IO */ Force_log(Page(LSN)) /*WAL logic, probably already flushed*/ Write copy to disc. WAL gives idempotence and testability. Force-Log-At-Commit At commit phase 1: Force_log(transaction.max_lsn)
  • 33. 33Gray & Reuter: Resource Manager WAL & F@C in PicturesWAL & F@C in Pictures VVlsn Volatile Page Versions Volatile Log Records VLlsn PVlsn Persistent Page Versions Durable Log Records DLlsn Time online:VVlsn = VLlsn restart: DLlsn <= VVlsn PVlsn <= DLlsn Commit: commit_lsn <= DLlsn At restart all volatile memory is reset and must be reconstructed from persistent memory. restart: PVlsn <= DLlsn commit_lsn <= DLlsn PVlsn DLlsn FIX, WAL and F@C assure these assertions
  • 34. 34Gray & Reuter: Resource Manager The One Bit Resource Manager Manages an array of transactional bits (the free space bit map). i = get_bit(); /* gets a free bit and sets it */ give_bit(i); /* returns a free bit (when transaction commits) */
  • 35. 35Gray & Reuter: Resource Manager The Bitmap and Its Log Records The Data Structure struct { /* layout of the one-bit RM data structure */ LSN lsn; /* page LSN for WAL protocol */ xsemaphore sem; /* semaphore regulates access to the page */ Boolean bit[BITS]; /* page.bit[i] = TRUE => bit[i] is free */ } page; /* allocates the page structure */ The Log Records struct /* log record format for the one-bit RM */ { int index; /* index of bit that was updated */ Boolean value; /* new value of bit[index] */ } log_rec; /* log record used by the one-bit RM */ const int rec_size = sizeof(log_rec); /*size of the log record body. */
  • 36. 36Gray & Reuter: Resource Manager Page and Log Consistency for 1-Bit RM Data dirty if reflects an uncommitted transaction update Otherwise, data is clean. Page Consistency: • No clean free bit has been given to any transaction. • Every clean busy bit was given to exactly one transaction. • Dirty bits locked in X mode by updating transactions . • The page.lsn reflects most recent log record for page. Log Consistency: • Log contains a record for every completed mini-transaction update to the page.
  • 37. 37Gray & Reuter: Resource Manager give_bit() get_bit() & give_bit(i) temporarily violate page consistency. Mini-transaction holds semaphore while violating consistency. Makes page & log mutually consistent before releasing sem. => each mini-transaction observes a consistent page state. void give_bit(int i) /* free a bit */ { if (LOCK_GRANTED==lock(i,LOCK_X,LOCK_LONG,0)) /* Lock bit */ { Xsem_get(&page.sem); /* get page sem */ page.bit[i] = TRUE; /* free the bit */ log_rec.index = i; /* generate log rec */ log_rec.value = TRUE; /*saying bit is free */ page.lsn = log_insert(log_rec,rec_size); /*write log rec&update lsn */ Xsem_give(&page.sem);} /* page consistent */ else /* if lock failed, caller doesn't own bit, */ Abort_Work(); /* in that case abort caller's trans */ return; }; /* */
  • 38. 38Gray & Reuter: Resource Manager get_bit() int get_bit(void) /* allocate a bit to and returns bit index */ { int i; /* loop variable */ Xsem_get(&page.sem); /* get the page semaphore */ for ( i = 0; i<BITS; i++); /* loop looking for a free bit */ {if (page.bit[i]) /* if bit is free, may be dirty (so locked) */ {if (LOCK_GRANTED =lock(i,LOCK_X,LOCK_LONG,0));/* lock bit */ { page.bit[i] =FALSE; /* got lock on it, so it was free */ log_rec.value = FALSE; /* generate log rec describing update */ log_rec.index = i; /* */ page.lsn = log_insert(log_rec,rec_size); /* write log rec&update lsn */ Xsem_give(&page.sem); /* page now consistent, give up sem */ return i; } /* return to caller */ }; /* else lock bounce so bit dirty */ }; /* try next free bit, */ Xsem_give(&page.sem); /* if no free bits, give up semaphore */ Abort_Work(); /* abort transaction */ return -1;}; /* returns -1 if no bits are available. */
  • 39. 39Gray & Reuter: Resource Manager Compensation Logging Undo may generate a log record recording undo step Makes Page LSN monotonic Similar technique was used for Communication Manager (session sequence number was monotonic) New State Logical Old State UNDO log record com pensation log record
  • 40. 40Gray & Reuter: Resource Manager 1-bit RM UNDO Callback void undo(LSN lsn) /* undo a one-bit RM operation */ { int i; /* bit index */ Boolean value; /* old bit value from log rec to be undone*/ log_rec_header header; /* buffer to hold log record header */ rec_size = log_read_lsn(lsn,header,0,log_rec,big); /* read log rec */ Xsem_get(&page.sem); /* get the page semaphore */ i = log_rec.index; /* get bit index from log record */ value = ! log_rec.value; /* get complement of new bit value */ page.bit[i] = value; /* update bit to old value */ log_rec.value= value; /* make a compensation log record */ page.lsn = log_insert(log_rec,rec_size); /* log it and bump page lsn */ Xsem_give(&page.sem); /* free the page semaphore */ return; } /* */
  • 41. 41Gray & Reuter: Resource Manager 1-bit RM Checkpoint Callback LSN checkpoint(LSN * low_water) /* copy 1-page RM state to persistent store*/ { Xsem_get(&page.sem); /* get the page semaphore */ *low_water = log_flush(page.lsn); /* WAL force up to page lsn, and */ /* set low water mark */ write(file,page,0,sizeof(page)); /* write page to persistent memory */ Xsem_give(&page.sem); /* give page semaphore */ return NULLlsn; } /* return checkpoint lsn (none needed) */
  • 42. 42Gray & Reuter: Resource Manager 1-bit RM REDO Callback void redo( LSN lsn) /* redo an free space operation */ { int i; /* bit index */ Boolean value; /* new bit value from log rec to be redone*/ log_rec_header header; /* buffer to hold log record header */ rec_size = log_read_lsn(lsn,header,0,log_rec,big); /* read log record */ i = log_rec.index; /* Get bit index */ lock(i,LOCK_X,LOCK_LONG,0); /* get lock on the bit (often not needed)*/ Xsem_get(&page.sem); /* get the page semaphore */ if (page.lsn < lsn) /* if bit version older than log record */ { value= log_rec.value; /* then redo the op. get new bit value */ page.bit[i] = value; /* apply new bit value to bit */ page.lsn = lsn; } /* advance the page lsn */ Xsem_give(&page.sem); /* free the page semaphore */ return; }; /* */
  • 43. 43Gray & Reuter: Resource Manager 1-BIT Rm Noise Callbacks Boolean prepare(LSN * lsn) /* 1-bit RM has no phase 1 work */ {*lsn = NULLlsn; return TRUE ;}; /* */ void Commit(void ) /* Commit release locks & */ { unlock_class(LOCK_LONG, TRUE, MyRMID()); }; /* return */ void Abort(void ) /* Abort release all locks & */ { unlock_class(LOCK_LONG, TRUE, MyRMID()); }; /* return */ Boolean savepoint((LSN * lsn) /* no work to do at savepoint */ {*lsn = NULLlsn; return TRUE ;}; /* */ void UNDO_savepoint(LSN lsn) /* rollback work or abort transaction */ {if (savepoint == 0) /* if at savepoint zero (abort) */ unlock_class(LOCK_LONG, TRUE, MyRMID()); /* release all locks */ }; /* */
  • 44. 44Gray & Reuter: Resource Manager Summary Model: Complex actions are a page/message action sequence. LSN: Each page carries an LSN and a semaphore. ReadFix: Read acts semaphore in shared mode. WriteFix: Update actions get semaphore in exclusive mode, generate one or more log records covering the page, advance the page LSN to match highest LSN give semaphore WAL: log_flush(page.LSN) before overwriting persistent page F@C: force all log records up to the commit LSN at commit Compensation Logging: Invalidate undone log record with a compensating log record. Idempotence via LSN: page LSN makes REDO idempotent
  • 45. 45Gray & Reuter: Resource Manager Two Phase Commit Getting two or more logs to agree Getting two or more RMs to agree Atomically and Durably Even in case one of them fails and restarts. The TM phases Prepare. Invoke each joined RM asking for its vote. Decide. If all vote yes, durably write commit log record. Commit. Invoke each joined RM, telling it commit decision. Complete. Write commit completion when all RM ACK.
  • 46. 46Gray & Reuter: Resource Manager Centralized Case of Two Phase Commit Each participant: (TM &RM) goes through a sequence of states These generate log records Null Active Aborting Aborted Prepared Committing Committed
  • 47. 47Gray & Reuter: Resource Manager ExamplesExamples Committed Aborted begin begin DO rm1 DO rm1 DO rm2 DO rm2 DO rm2 DO rm2 prepare rm2 {locks} UNDO rm2 commit { rm1, rm2} UNDO rm2 complete UNDO rm1 UNDO begin { rm1, rm2} complete
  • 48. 48Gray & Reuter: Resource Manager Transitions in Case of Restart Null Active Aborting Aborted Prepared Committing Committed Active state not persistent, others are persistent For both TM and RM. Log records make them persistent (redo) TM tries to drive states to the right. (to committed, aborted)
  • 49. 49Gray & Reuter: Resource Manager Successful two phase commit Message/Call flow from TM to each RM joined to transaction If TM and RM share the same log, the RM FORCE can piggyback on the TM FORCE One IO to commit a transaction (less if commit is grouped) Prepare LocalPrepare WritePrepareRecord InLog(force) yes LocalPrepare (lazy) WriteCommit RecordInLog (force) Commit Ack LocalCommitWork WriteCompletionRecord InLog(lazy) Ackwhen durable. Coordinator Participant WriteCompletion RecordInLog (lazy) State Active Prepared Committing LocalCommit Work (lazy) Committed State Active Prepared Committing Committed
  • 50. 50Gray & Reuter: Resource Manager Abort Two Phase Commit If RM sends "NO" or no response (timeout), TM starts abort. Calls UNDO of each trans log record May stop at a savepoint. At begin_trans it calls ABORT() callback of each joined RM
  • 51. 51Gray & Reuter: Resource Manager Distributed two phase commit Tracking joined TMs -- the communications manager helps Much as TRPC helps in the local case. Root TM owes a Prepare/Commit/Abort message to each joined TM. Joined TM does "local" commit. call first time? Transaction Manager A tridis outgoingtoB Communications Manager first time? Transaction Manager tridis incomingfromA Communications ManagerSession calleetrid, data trid, data
  • 52. 52Gray & Reuter: Resource Manager Full Transaction State Diagram Next section explains how these states are implemented. null persistent save point n =save point 0 Begun =save point 1 save point n active prepared committing committed aborting aborted Durable States Persistent States Volatile States livestates completestates
  • 53. 53Gray & Reuter: Resource Manager Summary of Resource Manager Concepts DO/UNDO/REDO Idempotent, Testable, Real operations Logical vs Physical logging Shadows to make logical logging work Physiological logging Fix, WAL, Force-at-commit Page/Message/Log consistency RM callbacks (the 1-bit resource manager) Join, Prepare, Commit, Abort, UNDO, REDO, .... Restart REDO/UNDO Two phase commit (RM story is simple).