Questions for Form B

Explain how the Ide dec-hi feedback scheme implements both: 1) query expansion and 2) term (re)weighting based on feedback. That is, where do the new terms come from, and where do the new weights come from. You may give your explanation by way of explaining the formula and processing involved.

Using an accumulator and pruning modifications for retrieval / ranking:
a)
is especially useful when there are long queries.
b)
gives a more exact computation of similarity than other methods.
c)
works well when rapid response is needed, when there are millions of documents involved, when primary memory is very tiny, and when queries are short lists of high frequency terms.
d)
all of the above.
e)
none of the above.

Some versions of the probabilistic weighting scheme add values like 0.5 or 1.0 in appropriate places (e.g., see p. 256 in the text). This is based on Jeffrey's prior or other arguments and has to do with statistical problems relating to dealing with small samples. There are also modifications needed when relevance feedback is used but no relevant documents are found in the first search. Please explain why these types of refinements are necessary - what problems they avoid, how much effect they are likely to have on effectiveness, and what user or system behavior might help eliminate the need to use them. [Hint: this is a thought question - to see if you can apply what you have read.]

In the vector space model, similarity is often measured with the cosine formula. In that we compute:
a)
the inner product of query and document vectors.
b)
the length of the document vector.
c)
the length of the query vector.
d)
all of the above.
f)
exactly 2 out of choices a through c (which?).
g)
none of the above.

Experimental studies have shown that adding too many new terms to a query may reduce search effectiveness in a feedback situation. Briefly explain two ways to reduce the number of new terms that are to be added.


fox@cs.vt.edu
Tue Aug 30 04:42:03 EDT 1994