Questions for Form C

The SIRE system did ranking after a Boolean query had selected documents that matched the query expression. With the P-norm scheme, if one starts with a regular Boolean query and simply uses a low p-value everywhere, one can rank all documents that have any of the query terms, thus including documents that would not be retrieved if SIRE were processing that query. Would the P-norm approach have higher recall than the SIRE approach, given the same typical Boolean query to start with? Which of the two schemes will require more processing? Explain your answers briefly.

When term-term clustering is used to try to identify terms to add for query expansion:
a)
one should always include high frequency terms when doing the clustering.
b)
the improvement in effectiveness of retrieval that will result is likely to be very dramatic, even more than comes from using relevance feedback methods.
c)
almost identical clusters are found as compared to those that committees of humans develop when building a thesaurus.
d)
all of the above.
e)
exactly 2 out of choices a through c (which?).
f)
none of the above.

To use a stopping-rule or pruning scheme for making the search process more efficient, it is usually necessary to have:
a)
a condition to test which indicates when no more query terms need to be considered (or are likely to need to be considered).
b)
some way to order the query terms, i.e., to tell which ones are likely to have a bigger effect on the final similarity value.
c)
a Boolean query form.
d)
all of the above.
e)
exactly 2 out of choices a through c (which?).
f)
none of the above.

MARIAN initially did exact matching of words in a query versus words in titles. In an earlier experiment we reduced document titles and queries to word stems, and had many confused users when they asked about Huckleberry Finn and found many finance articles. How could we avoid such problems, and yet still help users find cat when they type in cats, or child when they request children? Please explain briefly the file structures you might use.

WAIS takes all of your query terms as the basis for searching against all of the terms in each document. MARIAN, on the other hand, supports field-oriented searches in that, for example, it allows you to limit comparison of words you think occur in the title to words that actually are in titles of catalog entries. Thinking of this in terms of features, what are we really doing when we consider limiting matches based on fields - are features still based solely on words, or are they more complicated? How might this scheme be supported by data structures and processing if you had to improve WAIS to handle this for both initial searches and feedback searches?


fox@cs.vt.edu
Tue Aug 30 04:42:03 EDT 1994