Agreement Protocols
by Ohm Sornil
Introduction
Indistributed systems, where sites (or processors) often compete as well as
cooperate to achieve a common goal, it is often required that sites reach
mutual agreement. Reaching an agreement typically requires that sites have
knowledge about the values of other sites. When the systen is free from
failures, an agreement can easily be reached among the sites.
However, when the system is prone to failure, this method does not work
because faulty processors can send conflicting values to other processors
preventing them from reaching an agreement. A process of reaching an
agreement is needed in the presence of faults, and the process is called
an agreement protocol [1].
The System Model
We study agreement problems under the following model:
- There are n processors in the system, at most m can be faulty.
- Processors communicate to all other processors directly by message passing
(the system is logically fully connected).
- Identity of the sender of a message is always known to the receivers.
- Communication medium is reliable.
- Models of computation are synchronous.
For simplicity, we assume that only two values, 0 and 1, will be used to reach
an agreement, and only processors are prone to failures.
Model of Processor Failures
There are three modes in which a processor can fail:
- Crash fault. A processor stops functioning and never resumes operation.
- Omission fault. A processor omits to send messages to some processors.
- Malicious fault(Byzantine faults). A processor behaves randomly and arbitrarily.
When a faulty processor refuses to send a message, we can assume that a
nonfaulty processor simply chooses an arbitrary value and acts as if the
expected message has been received.
A Classification of Agreement Problems
Three well known agreement problems in distributed systems are the Byzantine
agreement problem, the consensus problem, and the interactive consistency
problem.
The Byzantine Agreement Problem
An arbitrary chosen processor, called the source processor, broadcasts its initial value.
The solution to this problem should meet the following two objectives:
- Agreement. All nonfaulty processors agree on the same value.
- Validity. If the source is nonfaulty, all nonfaulty processors should agree
on the initial value of the source.
Note:
- If the source is faulty, all nonfaulty processors agree on any common value.
- It is irrelevant what value faulty processors agree on or whether they agree on a value at all.
The Consensus Problem
Every processor broadcasts its initial value to all others. Initial values of the processors may be different.
- Agreement. All nonfaulty processors agree on the same single value.
- Validity. If the initial value of every nonfaulty processor is v, then every
nonfaulty processor must agree on v.
Note:
- If initial values of nonfaulty processors are different, all
nonfaulty processors can agree on any common value.
- We don't care what value faulty processors agree on.
The Interactive Consistency Problem
Every processor broadcasts its initial value to all others. Initial values of the processors may be different.
- Agreement. All nonfaulty processors agree on the same vector, (v_1,v_2,..., v_n).
- Validity. If the ith processor is nonfaulty and its initial value is v_i, then
the ith value to be agreed on by all nonfaulty processors must by v_i.
Note:
- If the jth processor is faulty, all nonfaulty processors can agree
on any common value for v_j.
- Value that faulty processors agree on is irrelevant.
Summary of The Three Agreement Problems
| Problem | Byzantine Agreement | Consensus | Interactive Consistency |
| Who initiates the value | One processor | All processors | All processors |
| Final agreement | Single value | Single value | A vector of values |
Relations Among the Agreement Problems
All three problems are closely related. For example, the Byzantine agreement
problem is a special case of the interactive consistency problem, in which
the initial value of only one processor is of interest. On the other hand,
if each of the n processors runs a copy of the Byzantine agreement protocol,
the interactive consistency problem is solved. Likewise, the consensus problem
can be solved using the solution of the interactive consistency problem. This
is because all nonfaulty processors can compute the value that is to be agreed
upon by taking the majority value of the common vector that is computed
by an interactive consistency protocol, or by choosing a default value if a
majority does not exist.
Solutions To The Byzantine Agreement Problem
The Byzantine agreement problem is also referred to as the Byzantine generals problem
because the problem resembles a situation where a team of generals in an
army is trying to reach agreement on an attack plan. The generals are located
at geographically distant positions and communicate only through messengers.
Some of the generals are traitors and try to prevent loyal generals from
reaching an agreement by deliberately transmitting erroneous information.
This problem is first defined and solved (under processor failures)
by Lamport et al. [2]. It is obvious that all the processors must exchange
the values through messages to reach a consensus. Processors send their
values to other processors and relay received values to others. During
the execution of the protocol, faulty processors may confuse other processors
by sending them conflicting values or by relaying to them fictitious values.
An Impossibility Result
Let us first see that in a three node system, if one node is faulty,
this problem cannot be solved if oral message passing is employed by nodes.
Suppose the nodes need to agree to a Boolean value: true(1) or false(0).
Let us consider the two scenarios shown in Fig. 1 with one of the
three nodes as faulty [3].
 |
| Figure 1: Two Scenarios |
In the first scenario, one of the receiving nodes j is faulty, and though
the sender sends it a 1, it transmits to node i that a 0 was sent by the
sender. In the second scenario, the sending node itself is faulty, and it
sends a 1 to node i and a 0 to node j, which the node j faithfully forwards
to i. Both these situations are indistinguishable to node i, and it cannot
decide whether the value sent by the sender should be considered as correct,
or the value forwarded by node j. If it decides to accept the value
sent by the sender (i.e. 1), then in the same scenario node j will accept
the value 0. This violates the second requirement of the Byzantine problem.
It has been shown that with oral messages it is impossible to solve this
problem unless more than two-third of the nodes are nonfaulty. That is,
the number of faulty nodes has to be less than one-third of the total
number of nodes. To cope with m faulty nodes, at least 3m+1 nodes are needed
to reach consensus.
Lamport-Shostak-Pease Algorithm
Lamport et al.'s algorithm [2], referred to as the Oral Message algorithm
OM(m), m > 0, solves the Byzantine agreement problem for 3m+1 or more
processors in the presence of at most m faulty processors. Let n denote
the total number of processors ( clearly, n>= 3m+1 ). The algorithm is
recursively defined as follows:
Algorithm OM(0).
- The source processor sends its value to every processor.
- Each processor uses the value it receives from the source. ( If it
receives no value, then it uses a default value of 0.)
Algorithm OM(m), m > 0.
- The source processor sends its value to every processor.
- For each i, let v_i be the value processor i receives from the source.
(If it receives no value, then it uses a default value of 0.). Processor
i acts as the new source and initiates Algorithm OM(m-1) wherein it sends
the value v_i to each of the n-2 other processors.
- For each i and each j (not equal to i), let v_j be the value processor i received
from processor j in Step 2 using Algorithm OM(m-1). (If it receives no value
, then it uses a default value of 0.). Processor i uses the value
majority(v_1,v_2,...,v_(n-1)).
The processors are successively
devided into smaller and smaller groups and the Byzantine agreement is
recursively achieved within each group of processors. Step 3 is executed
during the folding phase of the recursion, where a majority function is
applied to select the majority value out of the values recieved in a round
of message exchange ( step 2 ). The function majority (v_1, v_2,..., v_(n-1))
computes a majority value of the values v_1, v_2, ..., v_(n-1) if it exists
(otherwise, it returns the default value 0).
The execution of the algorithm OM(m) invokes n-1 separate executions of
the algorithm OM(m-1), each of which invokes n-2 executions of the
algorithm OM(m-2), and so on. Therefore, there are (n-1)(n-2)(n-3)...(n-k)
separate executions of the algorithm OM(m-k), k = 1,2,3,...,m+1. The message
complexity of the algorithm is O(n^m).
 |
| Figure 2: Algorithm OM(1) |
To better understand the protocol, consider a system with 1 faulty node and
a total of 4 nodes. First consider the case where the transmitter is reliable
, and one of the receivers (node 3) is faulty. In this case, the transmitter
during its execution of OM(1) will send the same value (say x) to the three
remaining nodes (one of which is faulty). During execution of OM(0), the
two reliable nodes will forward the value they received from the transmitter
(x) faithfully to other nodes. However, the faulty node can send any value to
others. Suppose it sends y to nodes 1 and 2.
In this case, node 2 (and node 1) will
get the values (x, x, y). The majority of this is x. Hence, the value agreed
to by node 2 is the same as the one sent by the reliable transmitter node.
This situation for node 2 is shown in Scenario 1 of Fig 2.
Now let us consider the case where the transmitter itself is faulty, and
the receivers are reliable. This is shown in Scenario 2 of Fig 2. Suppose
the faulty transmitter sends x to node 1, y to node 2, and z to node 3
during OM(1). Since the nodes 1, 2, and 3 are reliable, during OM(0) they
will transmit faithfully to others the value they received from the
transmitter. Hence, each one of then will agree on the values (x, y, z).
Since each executes the same majority algorithm, they will agree on the same
value, regardless of the actual values of x, y, and z, thereby satisfying
the requirements.
References
1. Singhal, M. and Shivaratri,N., Advanced Concepts in Operating
Systems, McGraw-Hill 1994, pp.330-337.
2. Lamport, L., R.Shostak, and M.Pease, "The Byzantine Generals Problem," ACM Transactions on Programming Languages and Systems, July 1982.
3. Jalote, P., Fault Tolerance in Distributed Systems, Prentice Hall
1994, pp.78-85.