DBMS Transaction
“
A database management system (DBMS)
allows a person to organize, store, and
retrieve data from a computer. It is a way of
communicating with a computer’s “stored
memory.”
2
1.
History
3
Timeline
4
79
74
71
70
60’
First DBMS - replacement of file storage:-
encoding, structuration of data problems.
- Files Formats between applications
Edgar F. Codd published an academic
paper titled, A Relational Model of
Data for Large Shared Banks. That
paper introduced a new way to model
data. It elaborated a way of building a
bunch of cross-linked tables
RSI introduced Oracle V2
(Version 2) as the first
commercially available SQL-
based RDBMS
followed by DB2, SAP Sysbase
ASE, and Informix.
CODASYL approach was a very
complicated system and required
substantial training
Searching for records could be
accomplished by one of three
techniques:
- Using the primary key (also
known as the CALC key)
- Moving relationships (also
called sets) from one record to
another
- Scanning all records in
sequential order
IBM System R is a database system built
as a research project at IBM's San Jose
Research Laboratory beginning in 1974
First system with :
- SQL language
- Transaction processing
Three success keys of DBMS
Relational model
5
2.
Why transactions ?
6
“
A database transaction symbolizes a unit of
work, performed within a database
management system (or similar system) against
a database, that is treated in a coherent and
reliable way independent of other transactions.
A transaction generally represents any change
in a database.
7
Reliability
Would you be happy if one debit of your account
had been eventually executed twice by Bank’s
database ? 8
Reliability
You book a flight with return, but eventually the
system has only recorded the outward flight
9
ACID properties
▷ Atomicity
▷ Consistency
▷ Isolation
▷ Durability
10
Atomicity
All transaction operations are indivisible, it means that
either :
- all operations are performed properly
- not any operation are performed, everything is cancelled
Illustration : in a financial transaction, amount have to be
debited of an account and credited to another one
11
Consistency
Aka correctness :
Database transaction must change affected data only in
allowed ways.
Any data written to the database must be valid according to
all defined rules (including constraints, cascades, triggers,
and any combination thereof)
12
of database
the guarantee that
database constraints are
not violated, particularly
once a transaction
commits.
Consistency
between transactions
the guarantee that any
transactions started in the
future necessarily see the
effects of other
transactions committed in
the past.
13
Isolation
Isolation determines how transaction operations are visible
to other users and systems.
A transaction should not be disturbed by other.
14
Durability
Durability is the ACID property which guarantees that
transactions that have committed will survive permanently.
If a flight booking reports that a seat has successfully been
booked, then the seat will remain booked even if the system
crashes
15
3.
Isolation levels
16
Isolation levels
… and read phenoma.
- Concepts below are part of pessimistic concurrency
- It assumes that conflicts between transactions can happen
often and blocks data records when a user starts to update.
So, other users will not be able to update that data until the
lock is released.
17
Isolation levels
Examples based on Postgres implementation
See https://github.com/GermainSIGETY/DBMS-transaction-cheat-
sheet
18
4 isolation levels
- Read uncommitted
- Read committed
- Repeatable read
- Serializable
Isolation levels
against 4 read phenomena
- Dirty read
- Nonrepeatable read
- Phantom read
- Serialization anomaly
19
Safety pause
20
Read Uncommitted isolation level
Lowest isolation level. with this level can occur:
- Dirty Read
- Nonrepeatable Read
- Phantom Read
- Serialization Anomaly
… but this isolation level does not exist in PG. Lowest level in
Postgres is read committed.
21
Read Committed isolation level
Default isolation level in Postgres. With this level, read
phenomenon fixed is:
- Dirty Read
But can occur:
- Nonrepeatable Read
- Phantom Read
- Serialization Anomaly
22
What is a dirty read ?
A transaction reads data written by a concurrent uncommitted
transaction.
23
Hands on a dirty read fixed :
24
Moment Transaction 1 Transaction 2 Comment
M1 BEGIN TRANSACTION ISOLATION
LEVEL READ COMMITTED;
UPDATE ACCOUNT SET BALANCE=500
WHERE NAME='Germs';
T1 modifies a line
(amount 100 ->500)
without committing it.
M2 BEGIN TRANSACTION
ISOLATION LEVEL
READ COMMITTED;
select * FROM account
WHERE name='Germs';
T2 reads Germs
account, amount
returned is still 100.
M3 COMMIT;
M1 select * FROM account
WHERE name='Germs';
T2 reads Germs
account, amount
returned is now 500.
Repeatable Read isolation level
Using this level, read phenomena fixed are :
- Dirty Read
- Nonrepeatable read
- Phantom Read
But can occurs :
- Serialization Anomaly
25
What is a Nonrepeatable read ?
A transaction re-reads data it has previously read and finds that
data has been modified by another transaction (that committed
since the initial read).
26
Hands on a Nonrepeatable read fixed :
Transaction 1 Transaction 2 Comment
M1 BEGIN TRANSACTION ISOLATION LEVEL
REPEATABLE READ;
select * FROM account
WHERE name='Germs';
T2 reads Germs account, amount
returned is 100
M2 BEGIN TRANSACTION ISOLATION
LEVEL REPEATABLE READ;
UPDATE ACCOUNT
SET BALANCE=500
WHERE NAME='Germs';
COMMIT;
T1 modifies a line (amount 100 ->
500), and commits.
M3 select * FROM account
WHERE name='Germs';
T2 reads Germs account again (in
same transaction opened during
M1), amount returned is still 100
27
What is a phantom read ?
A transaction re-executes a query returning a set of rows that satisfy
a search condition and finds that the set of rows satisfying the
condition has changed due to another recently-committed
transaction.
28
Hands on a phantom read fixed :
Transaction 1 Transaction 2 Comment
M1 BEGIN TRANSACTION
ISOLATION LEVEL
REPEATABLE READ;
select * FROM account
WHERE name='Germs';
T2 reads Germs accounts; only
one row is returned.
M2 BEGIN TRANSACTION
ISOLATION LEVEL REPEATABLE READ;
INSERT INTO ACCOUNT
(BALANCE, NAME)
VALUES (20, 'Germs');
COMMIT;
T1 adds a second account for
Germs, and commits.
M3 select * FROM account
WHERE name='Germs';
T2 reads Germs accounts again (in
same transaction opened during
M1), only one row is still returned
29
Hands on a phantom read fixed :
30
This repeatable read occurs too if we, instead of adding a row
(as above), T1 deletes a row.
T2 would see same number of rows as seen before deletion.
Serializable isolation level
Using this level, read phenomena fixed are :
- Dirty Read
- Nonrepeatable read
- Phantom Read
- Serialization Anomaly
31
What is a Serialization anomaly ?
The result of successfully committing a group of transactions is
inconsistent with all possible orderings of running those transactions
one at a time.
32
What is a Serialization anomaly ?
Imagine a flow of 2 transactions
- that both perform write operations
- Writes operations occur when the two transactions are
still opened
33
What is a Serialization anomaly ?
The two transactions are said serializable if
- we can execute T1 completely then T2 completely, or T2
completely then T1 completely, and results in DB are
identical.
The two transactions are said not serializable if
- we cannot execute T1 then T2, or T2 then T1, because
those two scenarios produce different results in DB.
34
What is a Serialization anomaly ?
Postgres cannot perform magic trick; if a transaction provoke a
serialization anomaly, Postgres will return an error for one of
the transaction and the transaction is rollbacked.
It’s up to you, valiant developer, to fix this error in your code :)
35
Hands on a Serialization anomaly
Transaction 1 Transaction 2 Comment
M1 BEGIN TRANSACTION ISOLATION
LEVEL SERIALIZABLE;
UPDATE ACCOUNT
SET BALANCE=666
WHERE NAME='Germs';
T1 updates Germs account with an
amount of 666. Without committing.
M2 BEGIN TRANSACTION ISOLATION
LEVEL SERIALIZABLE;
UPDATE ACCOUNT
SET BALANCE=0
WHERE NAME='Germs';
T2 updates Germs account with an
amount of 0. Without committing.
M3 COMMIT; T1 commits; ok
M4 COMMIT; T2 commits; ko. An Error is returned :
[40001] ERROR: could not serialize access
due to concurrent update and
transaction T2 is rollbacked.
36
Hands on a Serialization anomaly
37
Interesting observation there ;
- it is not the first opened transaction that won (and avoided
serialization error), but the first transaction that commits.
Safety pause
38
4.
Locks and deadlocks
39
Locks
Postgres provides multiple features and complexity about locks
in order to deal with read/writes conflicts.
Usage of locks can produces deadlocks.
Here we will use 'ROW EXCLUSIVE lock' : this lock is an implicit
lock used by postgres to avoid conflicts during INSERT,
UPDATE, DELETE commands.
40
Locks
Said differently; INSERT, UPDATE, DELETE commands
(commands that modify data), acquire this lock implicitly.
Example ; when a transaction updates a row, it acquires and
keep a lock for the row, if another transactions updates the
same row it has to wait ; this second updates is blocking.
41
What is a deadlock ?
In concurrent computing, deadlock is any situation in which no
member of some group of entities can proceed because each waits
for another member, including itself, to take action, such as sending
a message or, more commonly, releasing a lock.
42
Hands on a deadlock
Transaction 1 Transaction 2 Comment
M1 BEGIN TRANSACTION;
UPDATE ACCOUNT
SET BALANCE=0
WHERE NAME='Germs';
T1 updates Germs account with an amount of 0.
Without committing. T1 acquires a lock on Germs row.
M2 BEGIN TRANSACTION;
UPDATE ACCOUNT
SET BALANCE=1000
WHERE NAME='Rihanna';
T2 updates Rihanna account with an amount of 1000.
Without committing. T2 acquires a lock on Rihanna's
row.
M3 UPDATE ACCOUNT
SET BALANCE=0
WHERE NAME='Rihanna';
T1 updates Rihanna account with an amount of 0. T1
hangs on T2 lock.
M4 UPDATE ACCOUNT
SET BALANCE=1000
WHERE NAME='Germs';
T2 updates Germs account with an amount of 1000. T2
hangs on T1 lock and produce a deadlock. Postgres
return an Error [40P01] ERROR: deadlock detected
Detail:[...]. T2 is rollbacked.
M5 COMMIT; Because T2 has rollbacked, lock on Rihanna's row has
been released, then T1 is unblocked and can commit its
transaction ; ok.
43
Hands on a deadlock
Good to know ;
The possibility of deadlocks is not affected by isolation levels.
Because isolation level changes the behavior of read operations
(except serializable), but deadlock occurs due to write
operations.
44
Safety pause
45
5.
Optimistic lock
46
Optimistic lock
Optimistic locking is a way to perform updates on row
without usage of implicit nor explicit locks on rows.
For that, application uses a version column to set and
identify versions of rows.
47
Optimistic lock
Pros: It can be practical to avoid lock creation (so
contention) and deadlocks because updates operation will
fail immediately if version check fails. See after.
Cons: Update error is tedious; it can only be detected by
checking in logs (or with SQL driver) number of row
affected after update : if zero -> it has failed
48
Hands on Optimistic lock, with a failure
Transaction 1 Transaction 2 Comment
M1 SELECT * FROM V_ACCOUNT
WHERE NAME='Germs';
SELECT * FROM V_ACCOUNT
WHERE NAME='Germs';
T1 and T2 read Germs account and see that
current version is 1.
M2 UPDATE V_ACCOUNT
SET BALANCE=500, VERSION
= 2
WHERE NAME='Germs'
AND VERSION=1;
T1 modifies Germs account (amount 100-
>500); it works because version 1 in where
clause is correct. Now Germs account is on
version 2.
M3 UPDATE V_ACCOUNT
SET BALANCE=0, VERSION = 2
WHERE NAME='Germs'
AND VERSION=1;
T2 tries to modify Germs account (amount
100->0); it fails because version 1 does not
exist anymore in DB. It has failed : zero row
affected.
49
Notice: Here we did not opened transactions, everything is auto-committed immediately. If
T1 and T2 used transactions, T2 would fail on M3 only if T1 had committed on M2
6.
Go further
50
Go further
51
- Explicit locks : select for update etc
- CAP theorem
- BASE vs ACID
- eventual consistency.
Takeway
52
A Cheat sheet :
https://medium.com/@gsigety/dbms-transaction-sheet-cheat-
6b8e0f698ba3
Examples on github :
https://github.com/GermainSIGETY/DBMS-transaction-cheat-
sheet
Thanks!
Any questions?
You can find me at:
https://medium.com/@gsigety
53

DBMS Transaction course

  • 1.
  • 2.
    “ A database managementsystem (DBMS) allows a person to organize, store, and retrieve data from a computer. It is a way of communicating with a computer’s “stored memory.” 2
  • 3.
  • 4.
    Timeline 4 79 74 71 70 60’ First DBMS -replacement of file storage:- encoding, structuration of data problems. - Files Formats between applications Edgar F. Codd published an academic paper titled, A Relational Model of Data for Large Shared Banks. That paper introduced a new way to model data. It elaborated a way of building a bunch of cross-linked tables RSI introduced Oracle V2 (Version 2) as the first commercially available SQL- based RDBMS followed by DB2, SAP Sysbase ASE, and Informix. CODASYL approach was a very complicated system and required substantial training Searching for records could be accomplished by one of three techniques: - Using the primary key (also known as the CALC key) - Moving relationships (also called sets) from one record to another - Scanning all records in sequential order IBM System R is a database system built as a research project at IBM's San Jose Research Laboratory beginning in 1974 First system with : - SQL language - Transaction processing
  • 5.
    Three success keysof DBMS Relational model 5
  • 6.
  • 7.
    “ A database transactionsymbolizes a unit of work, performed within a database management system (or similar system) against a database, that is treated in a coherent and reliable way independent of other transactions. A transaction generally represents any change in a database. 7
  • 8.
    Reliability Would you behappy if one debit of your account had been eventually executed twice by Bank’s database ? 8
  • 9.
    Reliability You book aflight with return, but eventually the system has only recorded the outward flight 9
  • 10.
    ACID properties ▷ Atomicity ▷Consistency ▷ Isolation ▷ Durability 10
  • 11.
    Atomicity All transaction operationsare indivisible, it means that either : - all operations are performed properly - not any operation are performed, everything is cancelled Illustration : in a financial transaction, amount have to be debited of an account and credited to another one 11
  • 12.
    Consistency Aka correctness : Databasetransaction must change affected data only in allowed ways. Any data written to the database must be valid according to all defined rules (including constraints, cascades, triggers, and any combination thereof) 12
  • 13.
    of database the guaranteethat database constraints are not violated, particularly once a transaction commits. Consistency between transactions the guarantee that any transactions started in the future necessarily see the effects of other transactions committed in the past. 13
  • 14.
    Isolation Isolation determines howtransaction operations are visible to other users and systems. A transaction should not be disturbed by other. 14
  • 15.
    Durability Durability is theACID property which guarantees that transactions that have committed will survive permanently. If a flight booking reports that a seat has successfully been booked, then the seat will remain booked even if the system crashes 15
  • 16.
  • 17.
    Isolation levels … andread phenoma. - Concepts below are part of pessimistic concurrency - It assumes that conflicts between transactions can happen often and blocks data records when a user starts to update. So, other users will not be able to update that data until the lock is released. 17
  • 18.
    Isolation levels Examples basedon Postgres implementation See https://github.com/GermainSIGETY/DBMS-transaction-cheat- sheet 18
  • 19.
    4 isolation levels -Read uncommitted - Read committed - Repeatable read - Serializable Isolation levels against 4 read phenomena - Dirty read - Nonrepeatable read - Phantom read - Serialization anomaly 19
  • 20.
  • 21.
    Read Uncommitted isolationlevel Lowest isolation level. with this level can occur: - Dirty Read - Nonrepeatable Read - Phantom Read - Serialization Anomaly … but this isolation level does not exist in PG. Lowest level in Postgres is read committed. 21
  • 22.
    Read Committed isolationlevel Default isolation level in Postgres. With this level, read phenomenon fixed is: - Dirty Read But can occur: - Nonrepeatable Read - Phantom Read - Serialization Anomaly 22
  • 23.
    What is adirty read ? A transaction reads data written by a concurrent uncommitted transaction. 23
  • 24.
    Hands on adirty read fixed : 24 Moment Transaction 1 Transaction 2 Comment M1 BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED; UPDATE ACCOUNT SET BALANCE=500 WHERE NAME='Germs'; T1 modifies a line (amount 100 ->500) without committing it. M2 BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED; select * FROM account WHERE name='Germs'; T2 reads Germs account, amount returned is still 100. M3 COMMIT; M1 select * FROM account WHERE name='Germs'; T2 reads Germs account, amount returned is now 500.
  • 25.
    Repeatable Read isolationlevel Using this level, read phenomena fixed are : - Dirty Read - Nonrepeatable read - Phantom Read But can occurs : - Serialization Anomaly 25
  • 26.
    What is aNonrepeatable read ? A transaction re-reads data it has previously read and finds that data has been modified by another transaction (that committed since the initial read). 26
  • 27.
    Hands on aNonrepeatable read fixed : Transaction 1 Transaction 2 Comment M1 BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ; select * FROM account WHERE name='Germs'; T2 reads Germs account, amount returned is 100 M2 BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ; UPDATE ACCOUNT SET BALANCE=500 WHERE NAME='Germs'; COMMIT; T1 modifies a line (amount 100 -> 500), and commits. M3 select * FROM account WHERE name='Germs'; T2 reads Germs account again (in same transaction opened during M1), amount returned is still 100 27
  • 28.
    What is aphantom read ? A transaction re-executes a query returning a set of rows that satisfy a search condition and finds that the set of rows satisfying the condition has changed due to another recently-committed transaction. 28
  • 29.
    Hands on aphantom read fixed : Transaction 1 Transaction 2 Comment M1 BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ; select * FROM account WHERE name='Germs'; T2 reads Germs accounts; only one row is returned. M2 BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ; INSERT INTO ACCOUNT (BALANCE, NAME) VALUES (20, 'Germs'); COMMIT; T1 adds a second account for Germs, and commits. M3 select * FROM account WHERE name='Germs'; T2 reads Germs accounts again (in same transaction opened during M1), only one row is still returned 29
  • 30.
    Hands on aphantom read fixed : 30 This repeatable read occurs too if we, instead of adding a row (as above), T1 deletes a row. T2 would see same number of rows as seen before deletion.
  • 31.
    Serializable isolation level Usingthis level, read phenomena fixed are : - Dirty Read - Nonrepeatable read - Phantom Read - Serialization Anomaly 31
  • 32.
    What is aSerialization anomaly ? The result of successfully committing a group of transactions is inconsistent with all possible orderings of running those transactions one at a time. 32
  • 33.
    What is aSerialization anomaly ? Imagine a flow of 2 transactions - that both perform write operations - Writes operations occur when the two transactions are still opened 33
  • 34.
    What is aSerialization anomaly ? The two transactions are said serializable if - we can execute T1 completely then T2 completely, or T2 completely then T1 completely, and results in DB are identical. The two transactions are said not serializable if - we cannot execute T1 then T2, or T2 then T1, because those two scenarios produce different results in DB. 34
  • 35.
    What is aSerialization anomaly ? Postgres cannot perform magic trick; if a transaction provoke a serialization anomaly, Postgres will return an error for one of the transaction and the transaction is rollbacked. It’s up to you, valiant developer, to fix this error in your code :) 35
  • 36.
    Hands on aSerialization anomaly Transaction 1 Transaction 2 Comment M1 BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; UPDATE ACCOUNT SET BALANCE=666 WHERE NAME='Germs'; T1 updates Germs account with an amount of 666. Without committing. M2 BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; UPDATE ACCOUNT SET BALANCE=0 WHERE NAME='Germs'; T2 updates Germs account with an amount of 0. Without committing. M3 COMMIT; T1 commits; ok M4 COMMIT; T2 commits; ko. An Error is returned : [40001] ERROR: could not serialize access due to concurrent update and transaction T2 is rollbacked. 36
  • 37.
    Hands on aSerialization anomaly 37 Interesting observation there ; - it is not the first opened transaction that won (and avoided serialization error), but the first transaction that commits.
  • 38.
  • 39.
  • 40.
    Locks Postgres provides multiplefeatures and complexity about locks in order to deal with read/writes conflicts. Usage of locks can produces deadlocks. Here we will use 'ROW EXCLUSIVE lock' : this lock is an implicit lock used by postgres to avoid conflicts during INSERT, UPDATE, DELETE commands. 40
  • 41.
    Locks Said differently; INSERT,UPDATE, DELETE commands (commands that modify data), acquire this lock implicitly. Example ; when a transaction updates a row, it acquires and keep a lock for the row, if another transactions updates the same row it has to wait ; this second updates is blocking. 41
  • 42.
    What is adeadlock ? In concurrent computing, deadlock is any situation in which no member of some group of entities can proceed because each waits for another member, including itself, to take action, such as sending a message or, more commonly, releasing a lock. 42
  • 43.
    Hands on adeadlock Transaction 1 Transaction 2 Comment M1 BEGIN TRANSACTION; UPDATE ACCOUNT SET BALANCE=0 WHERE NAME='Germs'; T1 updates Germs account with an amount of 0. Without committing. T1 acquires a lock on Germs row. M2 BEGIN TRANSACTION; UPDATE ACCOUNT SET BALANCE=1000 WHERE NAME='Rihanna'; T2 updates Rihanna account with an amount of 1000. Without committing. T2 acquires a lock on Rihanna's row. M3 UPDATE ACCOUNT SET BALANCE=0 WHERE NAME='Rihanna'; T1 updates Rihanna account with an amount of 0. T1 hangs on T2 lock. M4 UPDATE ACCOUNT SET BALANCE=1000 WHERE NAME='Germs'; T2 updates Germs account with an amount of 1000. T2 hangs on T1 lock and produce a deadlock. Postgres return an Error [40P01] ERROR: deadlock detected Detail:[...]. T2 is rollbacked. M5 COMMIT; Because T2 has rollbacked, lock on Rihanna's row has been released, then T1 is unblocked and can commit its transaction ; ok. 43
  • 44.
    Hands on adeadlock Good to know ; The possibility of deadlocks is not affected by isolation levels. Because isolation level changes the behavior of read operations (except serializable), but deadlock occurs due to write operations. 44
  • 45.
  • 46.
  • 47.
    Optimistic lock Optimistic lockingis a way to perform updates on row without usage of implicit nor explicit locks on rows. For that, application uses a version column to set and identify versions of rows. 47
  • 48.
    Optimistic lock Pros: Itcan be practical to avoid lock creation (so contention) and deadlocks because updates operation will fail immediately if version check fails. See after. Cons: Update error is tedious; it can only be detected by checking in logs (or with SQL driver) number of row affected after update : if zero -> it has failed 48
  • 49.
    Hands on Optimisticlock, with a failure Transaction 1 Transaction 2 Comment M1 SELECT * FROM V_ACCOUNT WHERE NAME='Germs'; SELECT * FROM V_ACCOUNT WHERE NAME='Germs'; T1 and T2 read Germs account and see that current version is 1. M2 UPDATE V_ACCOUNT SET BALANCE=500, VERSION = 2 WHERE NAME='Germs' AND VERSION=1; T1 modifies Germs account (amount 100- >500); it works because version 1 in where clause is correct. Now Germs account is on version 2. M3 UPDATE V_ACCOUNT SET BALANCE=0, VERSION = 2 WHERE NAME='Germs' AND VERSION=1; T2 tries to modify Germs account (amount 100->0); it fails because version 1 does not exist anymore in DB. It has failed : zero row affected. 49 Notice: Here we did not opened transactions, everything is auto-committed immediately. If T1 and T2 used transactions, T2 would fail on M3 only if T1 had committed on M2
  • 50.
  • 51.
    Go further 51 - Explicitlocks : select for update etc - CAP theorem - BASE vs ACID - eventual consistency.
  • 52.
    Takeway 52 A Cheat sheet: https://medium.com/@gsigety/dbms-transaction-sheet-cheat- 6b8e0f698ba3 Examples on github : https://github.com/GermainSIGETY/DBMS-transaction-cheat- sheet
  • 53.
    Thanks! Any questions? You canfind me at: https://medium.com/@gsigety 53