Introduction

The most common type of search is string searching, covered in Chapter 10. UNIX has the grep family of search tools, and they are among the most frequently used UNIX commands, consuming a large share of the CPU and disk resources of many systems. String searching using fast algorithms is reasonably efficient, and sometimes those same algorithms can be simply changed to work with specialized computer architectures (such as parallel processors, or fast disk subsystems, sometimes with search chips on each disk head).

Plain string matching is valuable in situations when there is no space to build supporting indexes, when searching occurs infrequently, when complex patterns are involved, when data is transient or generally encrypted, or as a post-processing operation on retrieved sets that must be further reduced in size. On the other hand, when there is repeated use to search a collection using strings or pattern expressions, such as for dictionary or literary analysis, then some type of supporting index is needed.

Thus, PAT, a commercial system that uses Patricia trees and arrays, applies the algorithms and data structures discussed in Chapter 5. PAT's elegant model considers a collection as a long text string, and makes efficient use of the specialized supporting index structures. PAT will be used for searching a variety of test collections (i.e., 2), and to carry out a wide variety of operations that are well supported by Patricia trees.


fox@cs.vt.edu
Thu Oct 6 15:01:34 EDT 1994