Introduction to IR -- SALT86a (Another Look ...)
Introduction
- Boolean operators and queries
- Recall and Precision
- R and P Enhancement Devices: stemming, thesaurus, weighting
Blair & Maron Article
- Experiment: 40 queries, 40K documents (350K pages)
- Stairs
- P=.75, R=.20
Other Experiments
- Medlars, and failures
- indexing language
- document indexing
- search formulation
- user-system interaction
- NASA
- Cranfield
Overview of Automatic Methods
- Vector Weight: tf * idf
- Probabilistic Weight: log ( (N-n)/n )
- Term Discrimination
- Devices/transformations: phrase, thesaurus
Automatic Indexing Steps
- tokenize
- stop list
- stem
- weight
- find phrases
- build thesaurus