Questions for Form B

  1. Why is the match heuristic not usually as important as the occurrence heuristic in Boyer-Moore type search algorithms?

  2. What is the character that is used for addressing the occurrence table in the Boyer-Moore-Horspool algorithm? Why is it used and what effect does this have on performance?

  3. If each internal node in a PAT tree had a count of the number of external nodes in that subtree, what operation would be much faster in PAT? Why?

  4. Explain how PAT carries out queries like docs def including quality in terms of a regular search and using a file about def tag occurrences (i.e., the def.P file in /u1/data/pat/oald3e which has the start and end position for def tags in that dictionary).

  5. Given that we can tell any two keys apart in a text collection by considering only the first 50 characters, how can we improve the performance of PAT indexing for that collection? Please discuss savings in time and/or space during the indexing process and in the resulting index files.


fox@cs.vt.edu
Tue Aug 30 04:43:17 EDT 1994