- Why is the match heuristic not usually as important
as the occurrence heuristic in Boyer-Moore type search
algorithms?
- What is the character that is used for addressing the
occurrence table in the Boyer-Moore-Horspool algorithm? Why
is it used and what effect does this have on performance?
- If each internal node in a PAT tree had a count of the number
of external nodes in that subtree, what operation would be much faster
in PAT? Why?
- Explain how PAT carries out queries like
docs def including quality
in terms of a regular search and using a file about def
tag occurrences (i.e., the
def.P file in /u1/data/pat/oald3e which has the start
and end position for def tags in that dictionary).
- Given that we can tell any two keys apart in a text collection
by considering only the first 50 characters, how can we improve the
performance of PAT indexing for that collection?
Please discuss savings in time and/or space during the indexing
process and in the resulting index files.