- The SIRE system did ranking after a Boolean query had selected
documents that matched the query expression. With the P-norm scheme,
if one starts with a regular Boolean query and simply uses a low p-value
everywhere, one can rank all documents that have any of the query
terms, thus including documents that would not be retrieved if SIRE
were processing that query. Would the P-norm approach have higher
recall than the SIRE approach, given the same typical Boolean query to
start with? Which of the two schemes will require more processing?
Explain your answers briefly.
- When term-term clustering is used to try to identify terms to add for
query expansion:
- a)
- one should always include high frequency terms when doing
the clustering.
- b)
- the improvement in effectiveness of retrieval that will result is
likely to be very dramatic, even more than comes from using relevance
feedback methods.
- c)
- almost identical clusters are found as compared to those that
committees of humans develop when building a thesaurus.
- d)
- all of the above.
- e)
- exactly 2 out of choices a through c (which?).
- f)
- none of the above.
- To use a stopping-rule or pruning scheme for making the search process
more efficient, it is usually necessary to have:
- a)
- a condition to test which indicates when no more query terms
need to be considered (or are likely to need to be considered).
- b)
- some way to order the query terms, i.e., to tell which ones are
likely to have a bigger effect on the final similarity value.
- c)
- a Boolean query form.
- d)
- all of the above.
- e)
- exactly 2 out of choices a through c (which?).
- f)
- none of the above.
- MARIAN initially did
exact matching of words in a query versus words in titles.
In an earlier experiment we reduced document titles and queries to word
stems, and had many confused users when they asked about Huckleberry Finn and found many finance articles. How could we
avoid such problems, and yet still help users find cat when they type
in cats, or child when they request children? Please
explain briefly the file structures you might use.
- WAIS takes all of your query terms as the basis for searching against all
of the terms in each document. MARIAN, on the other hand, supports
field-oriented searches in that, for example, it allows you
to limit comparison of words you think occur in the title to words that
actually are in titles of catalog entries. Thinking of this in terms of features, what are we really doing when we consider limiting matches
based on fields - are features still based solely on words, or are they
more complicated? How might this scheme be supported by data structures
and processing if you had to improve WAIS to handle this for both initial
searches and feedback searches?