Serializability Theory
Consider a database D = (x, y, z), on which we will concurrently
perform a series of transactions T1, T2, ..., Tn. We want some way of
knowing whether we executed the transactions
"correctly." Formally, we state that an
execution is correct if and only if it is equivalent to some serial
execution of the transactions. In other words, a correct concurrent
execution of the transactions produces the same result we would get if we
executed them one at a time.It may not be clear how executing
transactions concurrently can cause problems. Date [1] identifies the Lost
Update problem as a potential problem for concurrently executing
transactions. A lost update occurs when two transactions, A and B, both
update the same location R in the database. Suppose A reads R, but before
A can update R, B reads R. Then, A updates R, and B updates R based on R's
previous contents. B has overwritten A's update; A's update, thus, is
lost. Note that this would not be a problem has A and B executed
sequentially: A would have updated R before B read R.
In this section,
we will consider how to use logs to detect whether a concurrent execution
was serializable (ie correct).
Logs
A log is simply a record of all operations which each transaction
performed. The log entries are chronological; each entry in the log gives
the following information>- operation type (read or write)
- data on
which the operation was performed
- transaction which performed the
operation
As we shall see, logs are central to our study of serializability.
Example: Consider the transactions T1, T2, and T3, defined below:
T1 = r1[x] r1[z] w1[x]
T2 = r2[y] r2[z] w2[y]
T3 = w3[x] r3[y] w3[z]
Two possible logs of their execution are:
L1 = w3[x] r1[x] r3[y] r2[y] w3[x] r2[z] w2[y] w1[x]
L2 = w3[x] r3[y] w3[z] r2[y] r2[z] w2[y] r1[x] r1[z] w1[x].
Serial Logs
A serial log is just a log in which transactions are run one at a time. In
a serial log, there is no interleaving of operations from different
transactions. L2 above is an example of a serial log, corresponding to the
execution {T3, T2, T1}. L1 above is not a serial log, since an operation
from transaction 1 (r1) occurs between two operations from transaction 3
(w3 and r3).
Log Equivalence
Intuitively, two logs are equivalent if the executions that produced them
leave the database in the same state. The formal definition of equivalence
is: Two logs are equivalent if and only if:
- Each read operation
reads from the same write operation in both logs, and
- Both logs have the
same final writes.
We say a read operation j reads from a write
operation i if read operation j reads a value most recently written by
write operation i. The final write of a log is the log's last write
operation.
Serializable Logs
Since we consider serial logs to be
"correct", but we also want to execute transactions concurrently,
we want to know whether a concurrent log is serializable; that
is, equivalent to a serial log of the same transactions. Knowing this will
allow us to execute transactions concurrently, but also verify the
correctness of the execution.
The Serializability Theorem
To
determine whether a log is serializable, we construct its serialization
graph. This graph is constructed as follows: The transactions {T1,
T2, ...,Tn} are nodes in the graph. There is a directed edge from Ti to Tj
if and only if, for some x, one of the following
hold:- ri[x]<wj[x],
- wi[x]<rj[x], or
- wi[x]<wj[x].
The
Serializability Theorem states that: A log L is
serializable if and only if SG(L) (the serialization graph) is
acyclic.Thus, given a transaction log, we can construct its
serialization graph and determine whether the log is
serializable.
Summary
In this section, we have determined how to
tell whether a concurrent execution of transactions on a database was
correct. We first decided that correctness meant that the execution was
equivalent to a serial execution of the transactions. Then, we found that,
by constructing a serialization graph, we can test its serializability,
since serializability is equivalent to the graph's being
acyclic.
Study Questions
1. Consider the following two
transactions, and a log of their execution:T1 = r1[a] r1[b] w1[c]
T2 = w2[b] w2[a] r2[c]
L = r1[z] r1[y] w2[y] w2[z] r2[x] w1[x]
Construct the serializability
graph for this log, and show that this execution is not
serializable.
2. Rearrange the log L above so that it is serializable. Try to do this
without making L a serial log (ie retain concurrent execution of T1 and
T2).
References
[1] Date, C.J. An Introduction to
Database Systems, Addison-Wesley, 1990,
[2] Singhal, M and N Shivaratri. Advanced Concepts in Operating
Systems, McGraw-Hill, 1994, pp. 488-491.