Edward A. Fox
Department of Computer Science,
Virginia Tech, Blacksburg VA 24061-0106
In this Unit we explore lexical analysis, stopword removal and stemming, and discuss the underlying issues in terms of tokenization, construction of finite state machines, and suffix lookup. Data structures, algorithms, implementation guidelines, and experimental results are given.
This Unit has two chapters and two laboratory exercises. The lecture coverage provides an overview.