- Explain how the Ide dec-hi feedback scheme implements both: 1) query
expansion and 2) term (re)weighting based on feedback. That is, where
do the new terms come from, and where do the new weights come from.
You may give your explanation by way of explaining the formula and
processing involved.
- Using an accumulator and pruning modifications for retrieval / ranking:
- a)
- is especially useful when there are long queries.
- b)
- gives a more exact computation of similarity than other
methods.
- c)
- works well when rapid response is needed, when there are
millions of documents involved, when primary memory is very tiny, and
when queries are short lists of high frequency terms.
- d)
- all of the above.
- e)
- none of the above.
- Some versions of the probabilistic weighting scheme add values like 0.5
or 1.0 in appropriate places (e.g., see p. 256 in the text). This is based
on Jeffrey's prior or other arguments and has to do with statistical
problems relating to dealing with small samples.
There are also modifications needed when relevance feedback is used but
no relevant documents are found in the first search. Please explain why
these types of refinements are necessary - what problems they avoid,
how much effect they are likely to have on effectiveness, and what user
or system behavior might help eliminate the need to use them.
[Hint: this is a thought question - to see if you can apply
what you have read.]
- In the vector space model, similarity is often measured with the cosine
formula. In that we compute:
- a)
- the inner product of query and document vectors.
- b)
- the length of the document vector.
- c)
- the length of the query vector.
- d)
- all of the above.
- f)
- exactly 2 out of choices a through c (which?).
- g)
- none of the above.
- Experimental studies have shown that adding too many new terms to a
query may reduce search effectiveness in a feedback situation. Briefly
explain two ways to reduce the number of new terms that are to be
added.