Unit SS, Part B: String Matching, Boyer-Moore and Variants
Unit SS, Part B: String Matching, Boyer-Moore and Variants
Standard B-M
- Shift maximum of 2 heuristics:
- 1) Match heuristic
- Shift right so match all chars already matched; have a new char at
match-check position
- Example: abracacabra
- ddhat[j] = min{s+m-j | ...} where s=skip amount
- a: no match so shift 1 to try the "r"; 1=1+11-11
- r: matched "a" so shift 3 to see if have "da"; 4=3+11-10
- b: matched "ra" so shift fully; 12=10+11-9
- 2) Occurrence heuristic
- Given text char X that didn't match pat, find rightmost place in
pat where X occurs, to try again, and return distance from right to
shift
- Example: abracacabra
- "a" is 0 from right, "b" is 2 in,
- "c" is 6 in from right, . . .
Simplified Boyer-Moore
- Boyer-Moore with only occur. heur.
- Why?
- Patterns are not periodic
- Space is less
- Thus should be faster on average
- Actually, performs slightly less well than Boyer-Moore (see tests)
Boyer-Moore-Horspool
- Let X = last char in pattern; find T = char in text at X's
position; use T to lookup skip in heuristic table
- Boyer-Moore-Horspool-Sunday:
- go back to left to right checking
- only use occurrence test but shift so char of text is at pat[m+1]
- BMH gives best empirical results
Example of BM Searching for Pattern in Text
BOYER-MOORE
ababaxabracabracadabratcadabrax
abracadabra
^(no-match, k=11 +=max(d[c]=6,dd[a]=1) = 17)
skip 6 to line up c in text with c in pattern
ababaxabracabracadabratcadabrax
abracadabra
^(match)
ababaxabracabracadabratcadabrax
abracadabra
^(no-match, k=16 +=max(d[c]=6,dd[r]=4) = 22)
skip 6 to line up c in text with c in pattern
ababaxabracabracadabratcadabrax
abracadabra
^(match)
ababaxabracabracadabratcadabrax
abracadabra
^(match)
ababaxabracabracadabratcadabrax
abracadabra
^(match)
ababaxabracabracadabratcadabrax
abracadabra
^(match)
ababaxabracabracadabratcadabrax
abracadabra
^(match)
ababaxabracabracadabratcadabrax
abracadabra
^(match)
ababaxabracabracadabratcadabrax
abracadabra
^(match)
ababaxabracabracadabratcadabrax
abracadabra
^(match)
ababaxabracabracadabratcadabrax
abracadabra
^(match)
ababaxabracabracadabratcadabrax
abracadabra
^(match)
ababaxabracabracadabratcadabrax
abracadabra
^(match)
ababaxabracabracadabratcadabrax
abracadabra
^(match,full-match, skip so line up what matched before)
ababaxabracabracadabratcadabrax
abracadabra
^(no-match, k+=max(d[r]=1,dd[a]=1)=1)
ababaxabracabracadabratcadabrax
abracadabra
^(match)
ababaxabracabracadabratcadabrax
abracadabra
^(match)
ababaxabracabracadabratcadabrax
abracadabra
^(match)
ababaxabracabracadabratcadabrax
abracadabra
^(match)
ababaxabracabracadabratcadabrax
abracadabra
^(match)
ababaxabracabracadabratcadabrax
abracadabra
^(match)
ababaxabracabracadabratcadabrax
abracadabra
^(match)
ababaxabracabracadabratcadabrax
abracadabra
^(no-match, k+=max(d[t]=11,dd[a]=14)=14)
ababaxabracabracadabratcadabrax
abracadabra
(past text-end so quit)