Transaction
Management
COMP9311 24T2; Week 8.1
By Zhengyi Yang, UNSW
Transaction
A transaction is a unit of program execution that accesses and possibly
updates various data items.
E.g., transaction to transfer $50 from account A to account B:
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)
Two main issues to deal with:
➢ Failures of various kinds, such as hardware failures and system crashes
➢ Concurrent execution of multiple transactions
2
Issue (1)
Concurrent execution of multiple transactions is needed
Why?
➢
i. Multiple users/transactions may read/change the same data
ii.Allowing multiple transactions to update data concurrently can result in complications -
data inconsistency.
Therefore transaction processing systems...
➢
i. need to support multiple transactions at the same time.
ii.usually allow multiple transactions to run concurrently.
3
Issue (2)
Failures of various kinds
a.System failure:
i. Disk failure - e.g., head crash, media fault.
ii.System crash - e.g., unexpected failure requiring a reboot.
b.Program error:
i. e.g., divide by zero.
c.Exception conditions:
i. e.g., no seats for your reservation.
d.Concurrency control:
i. e.g., deadlock, expired locks.
Transaction Processing Systems need to be robust against failure
4
Example of Fund Transfer (1)
A transaction is a unit of program execution that accesses and
possibly updates various data items.
Example: A possible transaction to transfer $50 from account A
to account B:
Each transaction typically
1. read(A)
includes some database
access operations
2. A := A – 50
3. write(A)
Operations relevant to
4. read(B) transaction processing:
1. Read
5. B := B + 50
2. Write
6. write(B)
3. Computation
5
Example of Fund Transfer (2)
Atomicity requirement
If the transaction fails after step 3 and before step 6, money will be “lost”
➢
leading to an inconsistent database state
➢ Failure could be due to software or hardware
The system should ensure that updates of a partially executed
➢
transaction are not reflected in the database (all-or-nothing)
Example:
Transaction to transfer $50
from account A to account B:
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)
6
Example of Fund Transfer (3)
Durability requirement
Once the user has been notified that the transaction has completed (i.e.,
➢
the transfer of the $50 has taken place), the updates to the database by
the transaction must persist even if there are software or hardware
failures.
Example:
Transaction to transfer $50
from account A to account B:
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)
7
Example of Fund Transfer (4)
Consistency requirement in above example:
➢ The sum of A and B is unchanged by the execution of the transaction
In general, consistency requirements include
➢ Explicitly specified integrity constraints such as primary keys and foreign keys
➢ Implicit integrity constraints
➢ A transaction must see a consistent database.
During transaction execution the database may be temporarily inconsistent.
➢ When the transaction completes successfully, the database must be consistent
➢ Erroneous transaction can lead to inconsistency
8
Example of Fund Transfer (5)
Isolation requirement — if between steps 3 and 6, another transaction T2 is allowed to access the
partially updated database, it will see an inconsistent database (the sum A + B will be less than it
should be)
T1 T2
1. read(A)
2. A := A – 50
3. write(A)
read(A), read(B), print(A+B)
4. read(B)
5. B := B + 50
6. write(B)
Isolation can be ensured trivially by running transactions serially. That is, one after the other.
However, executing multiple transactions concurrently has significant benefits.
9
ACID Summary
A transaction is a unit of program execution that accesses and possibly updates various data items. To
preserve the integrity of data, the database system must ensure the ACID property.
➢ Atomicity: Either all operations of the transaction are properly reflected in the database, or none are.
➢ Consistency: Every transaction sees a consistent database.
➢ Isolation: Although multiple transactions may execute concurrently, each transaction must be
unaware of other concurrently executing transactions. Intermediate transaction results must be hidden
from other concurrently executed transactions.
➢ That is, for every pair of transactions T and T, it must appear to T that either T, finished execution
i j i j
before T started, or T started execution after T finished.
i j i
➢ Durability: After a transaction completes successfully, the changes it has made to the database
persist, even if there are system failures.
10
Transaction States
Active – the initial state; the transaction stays in this state while it is executing
Partially committed – after the final statement has been executed.
Failed -- after the discovery that normal execution can no longer proceed.
Aborted – after the transaction has been rolled back using log and the database restored to its state
prior to the start of the transaction. Two options after it has been aborted:
➢ Restart the transaction
➢ Kill the transaction
Committed – after successful completion.
11
Concurrent Executions
Multiple transactions are allowed to run concurrently in the system.
The advantages are:
➢ Increased processor and disk utilization
➢ leading to better transaction throughput
➢ E.g., one transaction can be using the CPU while another is reading from or writing to the disk
➢ Reduced average response time for transactions: short transactions need not wait
behind long ones.
Concurrency control schemes – mechanisms to achieve isolation
➢ That is, to control the interaction among the concurrent transactions to prevent them
from destroying the consistency of the database
12
A Simple Transaction Model
We do not consider the full set of SQL language, and ignore SQL insertion/delete operations.
Two operations:
➢
read(X) to transfer the data item X from database to a variable, also called X in a buffer in main memory.
➢
write(X) to transfer the value in the variable X in the buffer to the data item in the database.
Recall example: transaction transfers $50 from account A to account B:
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)
13
More Concepts
Concurrency/Isolation/Schedule/Control
➢ The concurrent execution in a database system is similar to the multiprogramming in
an operating system (OS).
➢ When several transactions run concurrently, the isolation property may be violated.
➢ A schedule can help identify the executions that are guaranteed to ensure the isolation
property and thus database consistency.
➢ A concurrency-control scheme is to control the interaction among the concurrent
transactions to prevent them from destroying the consistency of the database.
14
Schedules
Schedule – a sequence that specifies the order in which instructions of concurrent
transactions are executed.
➢ A schedule for a set of transactions must consist of all instructions of those transactions
➢
Must preserve the order in which the instructions appear in each individual transaction.
➢ A transaction that successfully completes its execution will have a commit instruction as the
last statement
➢
By default, transaction is assumed to execute commit instruction as its last step
➢ A transaction that fails to successfully complete its execution will have an abort instruction
as the last statement
15
Two Transactions
Consider that the system receives two transactions
Let T transfer $50 from A to B,
1
Let T transfer 10% of the balance from A to B.
2
T2
T1
read (A)
read (A)
temp := A * 0.1
A := A – 50
A := A - temp
write (A)
write (A)
read (B)
read (B)
B := B + 50
B := B + temp
write (B)
write (B)
commit
commit
16
Schedule 1
A schedule S is serial if for every T1 T2
read (A)
transaction T in the schedule, all (the
A := A – 50
operations of) T are executed
write (A)
read (B)
consecutively in the schedule.
B := B + 50
write (B)
Example: a serial schedule in which T is
1 commit
followed by T :
read (A)
2
temp :=A * 0.1
A := A - temp
write (A)
read (B)
B := B + temp
write (B)
commit
17
Schedule 2
Example: another valid serial schedule where T is followed by T
2 1
T1 T2
read (A)
temp :=A * 0.1
A := A - temp
write (A)
read (B)
B := B + temp
write (B)
commit
read (A)
A := A – 50
write (A)
read (B)
B := B + 50
write (B)
18
commit
Schedule 3
T1 T2
Let T and T be the transactions given previously.
1 2
read (A)
➢ The following schedule is not a serial schedule A := A – 50
write (A)
➢ but it is equivalent to Schedule 1 read (A)
temp :=A * 0.1
A := A - temp
write (A)
read (B)
B := B + 50
write (B)
In Schedules 1, 2 and 3, the sum A + B is preserved.
commit
read (B)
B := B + temp
write (B)
commit
19
Schedule 4
T1 T2
The following concurrent schedule:
read (A)
➢ does not preserve the value of (A + B). A := A – 50
read (A)
temp :=A * 0.1
A := A - temp
write (A)
read (B)
write (A)
read (B)
B := B + 50
write (B)
Not all concurrent schedules are desirable
commit
B := B + temp
write (B)
commit
20
How to Avoid These Problems?
If operations are interleaved arbitrarily, incorrect results may occur.
➢ Isolation can be ensured trivially by running transactions serially.
Question: why not run only serial schedules?
Answer: Because of very poor throughput due to disk latency
➢ If a transaction waits for an I/O operation to complete, we cannot switch the CPU
processor to another transaction, thus wasting valuable CPU processing time.
➢ Additionally, if some transaction T is quite long, the other transactions must wait for T
to complete all its operations before starting.
21
How to Avoid These Problems?
Question: why run non-serial schedules?
➢ It is desirable to interleave the operations of transactions in an appropriate way.
➢ We can fully utilise resources.
➢ For example, if one transaction is waiting for I/O to complete, another transaction can use the CPU.
Point:
➢ serial schedules are considered unacceptable in practice
➢ executing multiple transactions concurrently has significant benefits.
22
Motivation (1)
We need to study the notion of correctness of concurrent executions.
➢ Every transaction is executed from beginning to end in isolation from the
operations of other transactions, we get a correct end result on the database.
We first need to define types of schedules that are always considered
to be correct when concurrent transactions are executing.
23
Motivation (2)
Question: How do we determine if non-serial schedules are
correct?
Intuition: If we can determine which non-serial schedules are
equivalent to a serial schedule, we can allow these schedules to
occur.
24
Summary
➢ Every serial schedule is considered correct
➢ We can assume this because every transaction is assumed to be correct if executed
on its own (according to the consistency preservation property).
➢ For serial schedules, it does not matter which transaction execute first. They are all
correct.
Basic Assumption – Each transaction preserves database consistency
➢ Therefore, a serial execution of a set of transactions preserves database consistency.
➢ Serial executions are correct
25
Serializability
A (possibly concurrent) schedule S of n transactions is
serializable if
it is equivalent to some serial schedule of the same n transactions.
➢
There are many notions forms of schedule equivalence give rise
to the notion of Conflict serializability (Will Discuss Later)
26
A Simplified View of Transactions
We ignore operations other than read and write instructions
We assume that transactions may perform arbitrary computations on data in local buffers in
between reads and writes.
Our simplified schedules consist of only read and write instructions.
An example: For a transaction that transfers $50 from account A to account B:
1. read(A)
We simply considers the
2. A := A – 50
read/write only
3. write(A) read(A)
4. read(B) write(A)
5. B := B + 50 read(B)
write(B)
6. write(B)
27
Conflicting Instructions
Instructions l and l of different transactions T and T respectively, conflict if and only if there exists
i j i j
some item Q accessed by both l and l, and at least one of these instructions wrote Q.
i j
1. l = read(Q), l = read(Q). l and l don’t conflict.
i j i j
2. l = read(Q), l = write(Q). They conflict.
i j
3. l = write(Q), l = read(Q). They conflict
i j
4. l = write(Q), l = write(Q). They conflict
i j
Intuitively, a conflict between l and l forces a (logical) temporal order between them. i.e., changing
i j
their order can result in a different combined outcome.
If l and l are consecutive in a schedule and they do not conflict, their results would remain the same
i j
even if they had been interchanged in the schedule.
For example, read–read operations are are not conflicting.
28
Summary: Conflicting Instructions
Summary: Two operations O and O are conflicting if
1 2
➢ They are in different transactions
➢ They access the same data item,
➢ At least one of them must be a write.
Language: The transaction of the second operation in the pair is
said to be in conflict with the transaction of the first operation.
29
Conflict Equivalence
Two schedules are said to be conflict equivalent if:
➢ the order of any two conflicting operations is the same in both schedules.
Two schedules are conflict equivalent if:
➢ Involve the same actions of the same transactions
➢ Every pair of conflicting actions is ordered the same way
For two schedules to be conflict equivalent:
➢ the operations applied to each data item affected by the schedules should be applied to
that item in both schedules in the same order.
We define equivalence of schedules by conflict equivalence, which is the more
commonly used definition
A better definition of equivalence compared to result equivalence .
30
Conflict Serializability
Using the notion of conflict equivalence, we define a schedule S to be conflict serializable if it is
(conflict) equivalent to some serial schedule S.
For conflict serializable schedules:
➢ we can reorder the nonconflicting operations in S until we form the equivalent serial schedule S.
This means that if a schedule can be transformed to any serial schedule without changing orders of
conflicting operations (but changing orders of non-conflicting, while preserving operation order inside
each transaction), then the outcome of both schedules is the same, and the schedule is conflict-
serializable by definition.
If a schedule S can be transformed into a schedule S’ by a series of swaps of non-conflicting
instructions, we say that S and S’ are conflict equivalent.
We say that a schedule S is conflict serializable if it is conflict equivalent to a serial schedule.
31
Conflict Serializability (2)
Schedule 3 can be transformed into Schedule 6, a serial schedule where T follows T , by
2 1
series of swaps of non-conflicting instructions. Therefore Schedule 3 is conflict serializable.
T1 T2 T1 T2
read (A) read (A)
write (A) write (A)
read (A) read (B)
write (A) write (B)
read (B) read (A)
write (B) write (A)
read (B) read (B)
write (B) write (B)
Schedule 3 Schedule 6
Note: any conflict serializable schedule is also a serializable schedule
32
Conflict Serializability (3)
Example of a schedule that is not conflict serializable:
T3 T4
read (Q)
write (Q)
write (Q)
We are unable to swap instructions in the above schedule to obtain
either the serial schedule
3 4 4 3
Note: Not all schedules are conflict serializable.
33
Allow Some Concurrent Schedules
Now we characterized the types of schedules that are always
considered to be correct when concurrent transactions are
executing.
The concept of serializability of schedules is used to identify
which schedules are correct when transaction executions have
interleaving of their operations in the schedules.
Since serial schedules are not practical, we can allow conflict
serializable schedules since they are correct!
34
Allow Some Concurrent Schedules
Saying that a non-serial schedule S is serializable is equivalent to
saying that it is correct
because it is equivalent to a serial schedule, which is always
➢
considered correct.
Practice:
Is a serializable schedule correct?
1.
Is being serializable the same as being serial.
2.
Is a non-serializable schedule correct?
3.
Is a nonserial schedule correct?
4.
35
Learning Outcomes
Transaction Concept
➢
Transaction State
➢
Concurrent Executions
➢
Serializability
➢
A nonserial schedules can be one of the following case:
➢
➢ those that are equivalent to one (or more) of the serial schedules
➢ those that are not equivalent to any serial schedule and hence are not
serializable.
36