Unit SS, Part B: String Matching, Boyer-Moore and Variants
Standard B-M
- Shift maximum of 2 heuristics:
- 1) Match heuristic
- Shift right so match all chars already matched; have a new char at
match-check position
- Example: abracacabra
- ddhat[j] = min{s+m-j | ...} where s=skip amount
- a: no match so shift 1 to try the "r"; 1=1+11-11
- r: matched "a" so shift 3 to see if have "da"; 4=3+11-10
- b: matched "ra" so shift fully; 12=10+11-9
- 2) Occurrence heuristic
- Given text char X that didn't match pat, find rightmost place in
pat where X occurs, to try again, and return distance from right to
shift
- Example: abracacabra
- "a" is 0 from right, "b" is 2 in,
- "c" is 6 in from right, . . .
Simplified Boyer-Moore
- Boyer-Moore with only occur. heur.
- Why?
- Patterns are not periodic
- Space is less
- Thus should be faster on average
- Actually, performs slightly less well than Boyer-Moore (see tests)
Boyer-Moore-Horspool
- Let X = last char in pattern; find T = char in text at X's
position; use T to lookup skip in heuristic table
- Boyer-Moore-Horspool-Sunday:
- go back to left to right checking
- only use occurrence test but shift so char of text is at pat[m+1]
- BMH gives best empirical results