Introduction

Clustering is an old field, related to classification and taxonomic analysis. It plays a crucial role in science, regarding exploratory data analysis, whereby observations are grouped based on some notion of similarity.

In information retrieval, it takes some adopting of concepts to make clustering fit, due to such issues as the following:

Back in the late 1960's, it first became possible to use computers to cluster information retrieval data. Moderate success was achieved in clustering terms to help with identification of groups of related words. Similarly, some benefits were shown in clustering documents, so that clusters could be retrieved at the same time. Since then, many faster algorithms have been developed, tests have indicated which methods seem to work best, collections can be analyzed to see if clustering will be helpful, and fast computers allow us to cluster even relatively large collections now.


fox@cs.vt.edu
Thu Oct 27 06:33:06 EDT 1994