Two & Three Phase Commit Protocols - A Summary

Ashish Prakash


Motivation

These protocols extend our discussion on fault tolerance, rollback and recovery (They add a new slant to rollback and recovery.) There are two approaches for making a system fault tolerant:-

Commit protocol

These protocols adopt the second approach to makig a system fault tolerant. Deal with agreement on what state to leave the system in. If the transaction is successful, all participants must agree it was successful. If it failed then they must agree on the state to leave the system in. (Like an agreement protocol)

Commit --- accept
not commit --- abort

Assumptions

Terminology

Our main focus as far as commit protocols go is on atomicity.

Two-Phase Commit Protocol

A good starting point, though it really doesn't provide much because

FSM (2 phase commit) description of co-ordinator and cohorts

From Mike Duckett's web page

The Protocol

Drawbacks:

How to fix these problems? - Three-Phase commit protocol

More Terminology

What causes blocking?

Concurrency set contains both final states. Therefore, other sites/nodes may have finished with their protocol and reached the final state. Some ``leeway'' is required to make a decision. Thus, an extra state is needed as the pre-final state - a ``prepare'' state. No concurrency set has both accept and abort.

Three-phase Commit Protocol

FSM


From Srinivas R. Gaddam's page

Phase I

Co-ordinator: sends commit request message to all cohorts, moves to next state
Cohorts: receive commit request from co-ordinator. Replies with abort or accept and moves to next state.

Phase II

Co-ordinator: gathers all cohorts replies. If any abort message is received, it sends out the final abort message. Thus abort is just like the 2-phase commit protocol. Otherwise sends a prepare to commit message. Cohorts: Receive either the abort message and abort transaction, or receive the prepare to commit message and acknowledge it.

Phase III

Co-ordinator: gets all the acknowledgements and sends the final commit message. Cohorts: receive final commit message and commit transaction.

No Blocking

Timeout periods are set so that if a message is not received within that time, it is assumed that the site failed and the rest of the sites are driven to their next states by "timeout transitions". Therefore no blocking occurs.

Independent recovery

Assumption: Site/node comes up in the same state it failed in (from stable storage it knows which state it failed in). NOTE: a proof of correctness by contradiction for these protocols can be found in "Advanced Concepts in Operating Systems" by Mukesh Singhal and Niranjan G. Shivaratri.
Last updated: March 31st 1997