You should be able to answer each of the following questions.
- Why should we seriously consider stopword removal as a part of
lexical analysis, instead of as a later step using hashing?
- What are the main stages of the lexical analysis and stopword
removal routine, that make up the pre-processing and the lookup phases?
- What in general happens during each of the various phases of
stemming using Porter's algorithm?
- What is the empirical evidence regarding the effect of stemming
on space, recall, and precision - as compared to other related schemes?