Summary of Key Concepts

  1. Many users find it easier to submit natural language queries, i.e., queries that are simply (long) lists of good keywords or phrases, instead of build complex Boolean queries.
  2. Similarity measures that consider collection statistics can be used to prepare rankings of retrieved documents, that attempt to present relevant documents before others.
  3. Expanding user queries, with terms from relevant retrieved documents or other sources (e.g., morphological or thesaurus processing), can often improve effectiveness, especially with good term selection or screening, weighting, and similarity computation.
  4. Reweighting based on relevance feedback data can improve the effectiveness of document ranking - essentially training the system regarding the terms that relate to an information need.
  5. Data structures and implementations for ranking and relevance feedback are derived from an overall model (e.g., vector, probabilistic) and tuned to classes of, and individual instances of, document collections.


fox@cs.vt.edu
Thu Oct 27 01:30:52 EDT 1994