Summary of Key Concepts

  1. Inverted files are widely used to support Boolean query processing, which also involves set manipulation, commonly done on sorted lists. Fast construction of those files exploits rapid in-memory computations, especially: sorting of small lists, and sorted list insertion. Then the remaining steps generally have linear complexity.

  2. Set manipulation can also be done using bit vectors, or for large ranges in values, hash tables. Most processing has linear complexity, and that may be sub- linear for bit vectors if operations on entire bytes or words has hardware support.

  3. Computing similarity between Boolean queries and documents with the MMM, Paice, and P-norm schemes has led to improvements in precision, at fixed recall levels, compared to conventional Boolean query processing. Softening the query interpretation, with ranking made possible as a result, can help better model users' view of these queries.


fox@cs.vt.edu
Tue Sep 6 04:51:11 EDT 1994