Digital Libraries - Automatic Indexing


Automatic Indexing begins with texts, and leads to:

Words can be conflated by Stemming or simple Plural Removal:

Terminology

lexical analysis
convert input stream of chars to tokens
query processing
analyze query and use it to find documents
stop word list, or stoplist, or negative dictionary
list of words to be ignored in indexing (e.g., a, an, and, of, the)
token
char group with collective significance (e.g., word, number, name)

Steps in Automatic Indexing

For more information see: