Global State Recording Algorithm :GSRA

- by Stephen Williams (figures added and revised by D. Kafura)

Background

In a distributed system where shared memory and system-wide clocks do not exist, the process of determining an instantaneous global state becomes difficult.

As was discovered in the talk on Lamport's Logical Clocks, even if every process could record their state at the same time, messages that are intransit at the time would not be included and the actual state of the system would be incorrect.

We have also seen methods of finding causal relationships through the use of Vector Clocks and Causal Message Ordering. Both of these

Introduction

The idea behind Chandy and Lamport's global state recording algorithm is that we can record a consistent state of the global system if we know that all messages that have been sent by one process have been received by another. This is accomplished by the use of a Marker which traverses the distributed sytem across all channels. This Marker, in turn, causes each process to record a snapshot of itself and, eventually, of the entire system.

The algorithm does not depend on a global clock or, for that matter, any clock. As long as the Marker can traverse the entire network in finite time, the algorithm works. This raises the question of whether or not a state that is recorded over a long period of time can relay any information about the system.

The primary benifit of the global state recording algorithm, is the ability to detect a stable property of the distributed system. Such a property could be deadlock, or termination, or in the case of our examples, the amount of money in a closed loop financial system.

Assumptions

This algorithm is based on several assumptions. Some are based in real life, and some are not. However, items covered previously in this class should allow one to use the algorithm under most, if not all, circumstances.

The Global State Recording ALGORITHM

How the algorithm is implemented

The key idea of the above algorithm is that, for EVERY channel/process intersection, EXACTLY one marker must pass. So, if a process starts the algorithm, it sends a Marker on each outgoing channel and then records messages on each incoming channel until a Marker appears on it. Once a marker has arrived on each incoming channel, that process is finished recording its part of the snapshot.

I prefer to think of the recorded state to be a "virtual" state. That is, once the first Marker has arrived at a process and the actual state is recorded, the actual state will, most likely, not be the same as the virtual state. Sooooooo, how do I do this???

Once a process has recorded its state initially --either it has started the GSRA or has received a marker--, it IMMEDIATELY sends a marker on EVERY outgoing channel.

Then, we record incoming messages on each incoming channel UNTIL a marker arrives on that channel. When the marker arrives on an incoming channel, we update the virtual state by the effect of those messages and stop recording THAT channel.

Once a Marker has arrived on ALL incoming channels, we are done. The virtual state recorded is the one that another algorithm will use in determining any stable properties of the system.

Why does it work?

The effect of the virtual state recorded is the same as if my algorithm had been to STOP sending messages after receiving a marker until the global state had been recorded. The idea is to push all of the messages that are currently in a channel to a process to be included in the process state. However, if we want an effective algorithm, we cannot bring a large distributed system to a grinding halt while we "push" messages out of the channels to get a consistent view. Therefore, our virtual state IGNORS outgoing messages that occur after a Marker has been sent. The virtual (recorded) state of each process is the same as if we had said, "No more outgoing messages until we know that all messages are out of any channels associated with you."

This does not,however, mean that the entire distributed system is finished or that any useful data has been distilled. It only means that, if another algorithm takes the states recorded at each process once the GSRA has completed on all processes, the state will be consistent. If a stable property was present in the system during the process of recording the snapshot then that property will exist IN the snapshot.

As has been stated, the recorded state may not be a state the system was actually in, however, it is a "possible" state for the system:

Example

Pictured below is a system with three nodes. All three processes begin with $500 and the channels are empty. Therefore the stable property of dollars in the system is $1500. Process p sends out $10 to process q and then decides to initiate the global state recording algorithm: it records its current state ($490) and send out a marker along channel c1. Meanwhile, p rocess q has sent $20 to p along channel c2 and $10 to r along channel c3.

Process q receives the the $10 transfer (increasing its value to $480) and then recieves the marker on channel c1. Because it received a marker, process q records its state as $480 and sends markers along each of its outgoing channels c2 and c3. Meanwhile, process r has sent $25 along c4. Note that it does not matter if r sent the message before or after q received the marker.

In Step 3, process r receives the $10 transfer and the marker from channel c3. Therefore, r updates its state to $485 and records this state. Process r also sends a marker on its outgoing channel, c4. Meanwhile, process p has sent $20 to process q along channel c1.


In Step 4, process q receives the $20 transfer on channel c1 and updates its value to $500. Notice that process q does not change is recorded state value. Also process p has receives the $20 transfer on channel c2. Process p records the $20 transfer as part of its recorded state (because it received this after the state recording algorithm had begun and the marker on that channel had not yet been received). Process p then receives the marker on channel c2 and can stop recording any further messages on that channel .


In Step 5, process p receives the $25 on channel c4. It adds this to its recorded state and also changes its current value from $480 to $505. When process p receives the marker on channel c4, the state recording is complete because all processes have received markers on all of their input channels. The final recorded state is shown in the table below.




NOTE: This algorithm, in no way, deals with what to do with the record once it is collected. Some other algorithm is required to say "Once you have collected your state send the info to the Auditor's node...blah...blah...blah."

References:

Singhal and Shivaratri -- the textbook --
Singhal, M. and N.Shivaratri, Advanced Concepts in Operating Systems, McGraw-Hill 1994, pp.112-113.
Chandy and Lamport
Chandy, K.M., and L.Lamport, "Distributed Snapshots: Determining Global States of Distributed Systems", ACM Transactions on Computer Systems, v.9 no.3 1991 pp. 272-314.

Button Go Back to the Operating Systems page.



Glossary:

Channel
A channel is a communications channel. In this algorthm, as well as in practice, a channel can carry messages only in one direction. Messages are assumed to remain in a channel a finite period of time and exit the channel in the same order as they entered.

Marker
A marker is a special message used in the global state recording algorithm to indicate that a system snapshot is in progress. Any process in the system requires that a marker cross each process/channel boundary EXACTLY once. A marker carries no information (other than its identity as a marker).

Process
A process is a seperate sequence of operations in a distributed system. More than one thread may exist on one physical machine, or they may exist on seperate machines. For the purpose of this algorithm, a process is assumed to have a seperate memory and a seperate clock from all other processes.

Stable Property
A stable property of a distributed system is a property that, once true, remains true for any subsequent global state of the system.

   Let S be a global state of the distributed system D. 
   Let y be a property of state S:  y(S)
   The predicate y is a stable property of D if y(S) -> y(S') 
   for all states S' reachable from S.

   If S(i) is the global state at the beginning of the snapshot
   and S(f) is the state at the end of the snapshot then:

      S is a permutation of S(i), and
      S(f) is a permutation of S
Snapshot
A snapshot of the system is the global state recorded with the algorithm. It is recorded over a finite period of time and may not be any actual state the system was in. It is, however, a possible state for the system. Think of the snapshot as an actual picture taking process. If you want to take a panorama picture of a flock of birds or a group of cars on the highway, you need to take several pictures. The state of the scene however is that there are a certain number of birds (or cars). If you want to capture the state, you must make sure that birds (or cars) are not left out of any frame and that there are none that are duplicated.

Button Go Back to the Operating Systems page.

Last updated: 8 Mar. 1995 / williams@csgrad.cs.vt.edu