For a thorough introduction to the probabilistic model, covering work through 1979, see Chapter 6 of the monograph by Van Rijsbergen.
Let us assume (following Robertson) that:
The relevance of a document to a request is independent
of other documents in the collection.
Then, following Maron and Cooper, we can now state
the probability ranking principle:
If a reference retrieval system's response to each request is a ranking
of the documents in the collection in order of decreasing probability of
relevance to the user who submitted the request, where the probabilities
are estimated as accurately as possible on the basis of whatever data
have been made available to the system for this purpose, the overall
effectiveness of the system to its user will be the best that is
obtainable on the basis of those data.
where xi = 0 or 1 indicates absence or presence of the ith index term.
Now we consider the two events:
w1 = document is relevant
w2 = document is non-relevant.
Since we cannot estimate P(wi/x) directly then we use Bayes' Theorem
P(wi/x) = P(x/wi) P(wi) / P(x)
where P(wi) is the prior probability of relevance (i=1) or non-relevance (i=2) and the factor P(x/wi) is proportional to what is commonly known as the likelihood of relevance or non-relevance given x
P(x/wi) = P(x1/wi) P(x2/wi) ... P(xn/wi)
Then, to simplify the equations we define:
pi = Prob (xi = 1/w1)
qi = Prob (xi = 1/w2).
The likelihood functions then are
P(x/w1) = PRODUCT(i=1 to n) (pi**xi) ((1 - pi)**(1-xi))
P(x/w2) = PRODUCT(i=1 to n) (qi**xi) ((1 - qi)**(1-xi))
Thus, for example, P((0,1,1,0,0,1)/w1) = (1 - p1)p2p3(1 - p4)(1 - p5)p6.
Going back to Bayes' Theorem, we substitute, take logs, and get a linear discriminant function where the coefficient for
xi (which is essentially a term weight) becomes:
log [pi ((1 - qi)] / [qi ((1 - pi) ]
pi = r / R
qi = (n-r) / (N-R)
we get the F4 formula of Robertson and Sparck Jones
log { r / ( R-r ) } / { ( n-r ) / ( N-n-R+r ) }
with the following variable definitions