Distributed DBMS model

Transactions, Architecture

by Constantinos Phanouriou


Introduction

In the 1960s and 1970s, most efforts in operating system design were largely focused on the so-called traditional operating systems, which ran on stand-along computers with single processors. Thus, database system were generally implemented as an application on top of a general purpose operating system. Since the requirement of a database system are different from those of a general purpose system, new functionality must be added to the operating system to efficiently support a database system on top of it (see Fig. 1(a)). Sometimes, the new features might not be appropriate for an operating system and may have to be implemented in the user space at a much higher overhead.

Another approach is to write a new operating system which efficiently supports only the functions needed by database systems (see Fig. 1(b)). Although such systems will have higher performance for database applications, their implementation from scratch might be expensive.


Requirements of a Database Operating System

Features of a database system:

Transaction Management:

A user accesses a database system by executing a program, called a transaction. A transaction is a sequence of read and write operations on the database. It is the unit of user interaction with the database system. A transaction maintains database consistency, even when several transactions are running concurrently (the problem of concurrency control). The system must ensure that a transaction either runs to completion or does not run at all (transaction atomicity). The system must ensure that in a case of system failure all partially completed transactions are either undone or are run to completion (failure recovery).

The operating system should support mechanisms to facilitate the implementation of the following properties in transactions:

Support for Complex, Persistent data:

A database system must support definition, efficient manipulation, and efficient storage on secondary devices of files with complex structures. Thus, an operating system must efficiently organize a file on secondary storage.

Buffer Management:

Database systems are dominated by heavy I/O accesses and I/O traffic is usually a bottleneck. Data must be cached in buffers in the main memory to speed up the transactions. Thus, a database system requires mechanisms to perform the following operations efficiently: search the buffer to see if a page is present; select a page for replacement; and locate and retrieve the needed data page from secondary storage.

Database Systems

A database system consists of a set of shared data objects that can be accessed by users. A database can be viewed as a collection of data objects. The state of a database is given by the values of its data objects. In a database, certain semantic relationships, called consistency assertions or integrity constraints must hold among its data objects. A database is said to be consistent if the values of its data objects satisfy all of its consistency assertions.

Transactions:

A transaction is the unit of user interaction with the database. We assume the following properties about a transaction: If a transaction modifies at least one data object, then its called an update transaction, or an update. Otherwise its called a read-only transaction, or a query. The set of data objects that are read by a transaction are referred to as its readset and the set of data objects that are written by it as its writeset.

Conflicts:

When transactions try to access the same data object then there is a conflict.

Transaction Processing

A transaction is executed by executing its actions one by one from the beginning to the end.

Concurrency Control Model

We can view a database system as consisting of three software modules: a transaction manager (TM), a data manager (DM), and a scheduler (see Fig. 2).

The transaction manager supervises the execution of a transaction. It intercepts and executes all the submitted transactions. Thus, the TM is the interface between users and the database system.

The scheduler is responsible for enforcing concurrency control. It grands or releases locks on data objects as requested by a transaction.

The data manager manages the database. It carries out the read-write requests issued by the transaction manager on behalf of a transaction by operating them on the database. It is also responsible for failure recovery. Thus, the DM is the interface between the scheduler and the database.

The concurrency control model of a distributed database system is shown in Fig. 3.


The problem of Concurrency Control

In a typical database system, transactions are executed concurrently. Since these transactions may access the same data objects, several anomalous situations may arise if the interleaving of actions is not controlled in some orderly way.

Inconsistent Retrieval:

Inconsistent retrival occurs when a transaction reads some data objects of a database before another transaction has completed with its modification of those data objects.

Inconsistent Update:

Inconsistent update occurs when many transactions read and write onto a common set of data objects of a database, leaving the database in an inconsistent state.


Date: May 2, 1995

By Constantinos Phanouriou