2. “
A database management system (DBMS)
allows a person to organize, store, and
retrieve data from a computer. It is a way of
communicating with a computer’s “stored
memory.”
2
4. Timeline
4
79
74
71
70
60’
First DBMS - replacement of file storage:-
encoding, structuration of data problems.
- Files Formats between applications
Edgar F. Codd published an academic
paper titled, A Relational Model of
Data for Large Shared Banks. That
paper introduced a new way to model
data. It elaborated a way of building a
bunch of cross-linked tables
RSI introduced Oracle V2
(Version 2) as the first
commercially available SQL-
based RDBMS
followed by DB2, SAP Sysbase
ASE, and Informix.
CODASYL approach was a very
complicated system and required
substantial training
Searching for records could be
accomplished by one of three
techniques:
- Using the primary key (also
known as the CALC key)
- Moving relationships (also
called sets) from one record to
another
- Scanning all records in
sequential order
IBM System R is a database system built
as a research project at IBM's San Jose
Research Laboratory beginning in 1974
First system with :
- SQL language
- Transaction processing
7. “
A database transaction symbolizes a unit of
work, performed within a database
management system (or similar system) against
a database, that is treated in a coherent and
reliable way independent of other transactions.
A transaction generally represents any change
in a database.
7
8. Reliability
Would you be happy if one debit of your account
had been eventually executed twice by Bank’s
database ? 8
9. Reliability
You book a flight with return, but eventually the
system has only recorded the outward flight
9
11. Atomicity
All transaction operations are indivisible, it means that
either :
- all operations are performed properly
- not any operation are performed, everything is cancelled
Illustration : in a financial transaction, amount have to be
debited of an account and credited to another one
11
12. Consistency
Aka correctness :
Database transaction must change affected data only in
allowed ways.
Any data written to the database must be valid according to
all defined rules (including constraints, cascades, triggers,
and any combination thereof)
12
13. of database
the guarantee that
database constraints are
not violated, particularly
once a transaction
commits.
Consistency
between transactions
the guarantee that any
transactions started in the
future necessarily see the
effects of other
transactions committed in
the past.
13
14. Isolation
Isolation determines how transaction operations are visible
to other users and systems.
A transaction should not be disturbed by other.
14
15. Durability
Durability is the ACID property which guarantees that
transactions that have committed will survive permanently.
If a flight booking reports that a seat has successfully been
booked, then the seat will remain booked even if the system
crashes
15
17. Isolation levels
… and read phenoma.
- Concepts below are part of pessimistic concurrency
- It assumes that conflicts between transactions can happen
often and blocks data records when a user starts to update.
So, other users will not be able to update that data until the
lock is released.
17
18. Isolation levels
Examples based on Postgres implementation
See https://github.com/GermainSIGETY/DBMS-transaction-cheat-
sheet
18
21. Read Uncommitted isolation level
Lowest isolation level. with this level can occur:
- Dirty Read
- Nonrepeatable Read
- Phantom Read
- Serialization Anomaly
… but this isolation level does not exist in PG. Lowest level in
Postgres is read committed.
21
22. Read Committed isolation level
Default isolation level in Postgres. With this level, read
phenomenon fixed is:
- Dirty Read
But can occur:
- Nonrepeatable Read
- Phantom Read
- Serialization Anomaly
22
23. What is a dirty read ?
A transaction reads data written by a concurrent uncommitted
transaction.
23
24. Hands on a dirty read fixed :
24
Moment Transaction 1 Transaction 2 Comment
M1 BEGIN TRANSACTION ISOLATION
LEVEL READ COMMITTED;
UPDATE ACCOUNT SET BALANCE=500
WHERE NAME='Germs';
T1 modifies a line
(amount 100 ->500)
without committing it.
M2 BEGIN TRANSACTION
ISOLATION LEVEL
READ COMMITTED;
select * FROM account
WHERE name='Germs';
T2 reads Germs
account, amount
returned is still 100.
M3 COMMIT;
M1 select * FROM account
WHERE name='Germs';
T2 reads Germs
account, amount
returned is now 500.
25. Repeatable Read isolation level
Using this level, read phenomena fixed are :
- Dirty Read
- Nonrepeatable read
- Phantom Read
But can occurs :
- Serialization Anomaly
25
26. What is a Nonrepeatable read ?
A transaction re-reads data it has previously read and finds that
data has been modified by another transaction (that committed
since the initial read).
26
27. Hands on a Nonrepeatable read fixed :
Transaction 1 Transaction 2 Comment
M1 BEGIN TRANSACTION ISOLATION LEVEL
REPEATABLE READ;
select * FROM account
WHERE name='Germs';
T2 reads Germs account, amount
returned is 100
M2 BEGIN TRANSACTION ISOLATION
LEVEL REPEATABLE READ;
UPDATE ACCOUNT
SET BALANCE=500
WHERE NAME='Germs';
COMMIT;
T1 modifies a line (amount 100 ->
500), and commits.
M3 select * FROM account
WHERE name='Germs';
T2 reads Germs account again (in
same transaction opened during
M1), amount returned is still 100
27
28. What is a phantom read ?
A transaction re-executes a query returning a set of rows that satisfy
a search condition and finds that the set of rows satisfying the
condition has changed due to another recently-committed
transaction.
28
29. Hands on a phantom read fixed :
Transaction 1 Transaction 2 Comment
M1 BEGIN TRANSACTION
ISOLATION LEVEL
REPEATABLE READ;
select * FROM account
WHERE name='Germs';
T2 reads Germs accounts; only
one row is returned.
M2 BEGIN TRANSACTION
ISOLATION LEVEL REPEATABLE READ;
INSERT INTO ACCOUNT
(BALANCE, NAME)
VALUES (20, 'Germs');
COMMIT;
T1 adds a second account for
Germs, and commits.
M3 select * FROM account
WHERE name='Germs';
T2 reads Germs accounts again (in
same transaction opened during
M1), only one row is still returned
29
30. Hands on a phantom read fixed :
30
This repeatable read occurs too if we, instead of adding a row
(as above), T1 deletes a row.
T2 would see same number of rows as seen before deletion.
31. Serializable isolation level
Using this level, read phenomena fixed are :
- Dirty Read
- Nonrepeatable read
- Phantom Read
- Serialization Anomaly
31
32. What is a Serialization anomaly ?
The result of successfully committing a group of transactions is
inconsistent with all possible orderings of running those transactions
one at a time.
32
33. What is a Serialization anomaly ?
Imagine a flow of 2 transactions
- that both perform write operations
- Writes operations occur when the two transactions are
still opened
33
34. What is a Serialization anomaly ?
The two transactions are said serializable if
- we can execute T1 completely then T2 completely, or T2
completely then T1 completely, and results in DB are
identical.
The two transactions are said not serializable if
- we cannot execute T1 then T2, or T2 then T1, because
those two scenarios produce different results in DB.
34
35. What is a Serialization anomaly ?
Postgres cannot perform magic trick; if a transaction provoke a
serialization anomaly, Postgres will return an error for one of
the transaction and the transaction is rollbacked.
It’s up to you, valiant developer, to fix this error in your code :)
35
36. Hands on a Serialization anomaly
Transaction 1 Transaction 2 Comment
M1 BEGIN TRANSACTION ISOLATION
LEVEL SERIALIZABLE;
UPDATE ACCOUNT
SET BALANCE=666
WHERE NAME='Germs';
T1 updates Germs account with an
amount of 666. Without committing.
M2 BEGIN TRANSACTION ISOLATION
LEVEL SERIALIZABLE;
UPDATE ACCOUNT
SET BALANCE=0
WHERE NAME='Germs';
T2 updates Germs account with an
amount of 0. Without committing.
M3 COMMIT; T1 commits; ok
M4 COMMIT; T2 commits; ko. An Error is returned :
[40001] ERROR: could not serialize access
due to concurrent update and
transaction T2 is rollbacked.
36
37. Hands on a Serialization anomaly
37
Interesting observation there ;
- it is not the first opened transaction that won (and avoided
serialization error), but the first transaction that commits.
40. Locks
Postgres provides multiple features and complexity about locks
in order to deal with read/writes conflicts.
Usage of locks can produces deadlocks.
Here we will use 'ROW EXCLUSIVE lock' : this lock is an implicit
lock used by postgres to avoid conflicts during INSERT,
UPDATE, DELETE commands.
40
41. Locks
Said differently; INSERT, UPDATE, DELETE commands
(commands that modify data), acquire this lock implicitly.
Example ; when a transaction updates a row, it acquires and
keep a lock for the row, if another transactions updates the
same row it has to wait ; this second updates is blocking.
41
42. What is a deadlock ?
In concurrent computing, deadlock is any situation in which no
member of some group of entities can proceed because each waits
for another member, including itself, to take action, such as sending
a message or, more commonly, releasing a lock.
42
43. Hands on a deadlock
Transaction 1 Transaction 2 Comment
M1 BEGIN TRANSACTION;
UPDATE ACCOUNT
SET BALANCE=0
WHERE NAME='Germs';
T1 updates Germs account with an amount of 0.
Without committing. T1 acquires a lock on Germs row.
M2 BEGIN TRANSACTION;
UPDATE ACCOUNT
SET BALANCE=1000
WHERE NAME='Rihanna';
T2 updates Rihanna account with an amount of 1000.
Without committing. T2 acquires a lock on Rihanna's
row.
M3 UPDATE ACCOUNT
SET BALANCE=0
WHERE NAME='Rihanna';
T1 updates Rihanna account with an amount of 0. T1
hangs on T2 lock.
M4 UPDATE ACCOUNT
SET BALANCE=1000
WHERE NAME='Germs';
T2 updates Germs account with an amount of 1000. T2
hangs on T1 lock and produce a deadlock. Postgres
return an Error [40P01] ERROR: deadlock detected
Detail:[...]. T2 is rollbacked.
M5 COMMIT; Because T2 has rollbacked, lock on Rihanna's row has
been released, then T1 is unblocked and can commit its
transaction ; ok.
43
44. Hands on a deadlock
Good to know ;
The possibility of deadlocks is not affected by isolation levels.
Because isolation level changes the behavior of read operations
(except serializable), but deadlock occurs due to write
operations.
44
47. Optimistic lock
Optimistic locking is a way to perform updates on row
without usage of implicit nor explicit locks on rows.
For that, application uses a version column to set and
identify versions of rows.
47
48. Optimistic lock
Pros: It can be practical to avoid lock creation (so
contention) and deadlocks because updates operation will
fail immediately if version check fails. See after.
Cons: Update error is tedious; it can only be detected by
checking in logs (or with SQL driver) number of row
affected after update : if zero -> it has failed
48
49. Hands on Optimistic lock, with a failure
Transaction 1 Transaction 2 Comment
M1 SELECT * FROM V_ACCOUNT
WHERE NAME='Germs';
SELECT * FROM V_ACCOUNT
WHERE NAME='Germs';
T1 and T2 read Germs account and see that
current version is 1.
M2 UPDATE V_ACCOUNT
SET BALANCE=500, VERSION
= 2
WHERE NAME='Germs'
AND VERSION=1;
T1 modifies Germs account (amount 100-
>500); it works because version 1 in where
clause is correct. Now Germs account is on
version 2.
M3 UPDATE V_ACCOUNT
SET BALANCE=0, VERSION = 2
WHERE NAME='Germs'
AND VERSION=1;
T2 tries to modify Germs account (amount
100->0); it fails because version 1 does not
exist anymore in DB. It has failed : zero row
affected.
49
Notice: Here we did not opened transactions, everything is auto-committed immediately. If
T1 and T2 used transactions, T2 would fail on M3 only if T1 had committed on M2