Naive Algorithm
- O(mn) since can try entire pattern at each position in text
string
- Definitions
- i = index over text where a match might start
- j = index over pattern (1<=j<=m)
- k= index over text as try to match
Knuth-Morris-Pratt - 1
- Algorithm in Fig. 10.2
- k advances in text, j in pattern
- if pattern mismatch occurs, then use "next" position in pattern
- Explanation of "next" table example
- mismatch pat[j] => try pat[next[j]]
- a/0 b/1 r/1 a/0 c/2 a/0 d/2 a/0 b/1 r/1 a/0 X/5 (for X not in
text)
Knuth-Morris-Pratt - 2
- Explanation of "next" table example by tracing pattern
pre-processing
- i=1, j=next[1]=0 **
- j==0 => i=2, j=1 (restart w. pat) pat[2]!=pat[1] => next[2]=1 **
"b" failed so backup and try "a"
- j!=0, pat[2]!=pat[1] => j=next[1]=0
- j==0 => i=3, j=1 (restart w. pat) pat[3]!=pat[1] => next[3]=1 **
"r" failed so backup and try "a"
- j!=0, pat[3]!=pat[1] => j=next[1]=0
- j==0 => i=4, j=1 (restart w. pat) pat[4]==pat[1] =>
next[4]=next[1]=0 ** "a" failed => advance pat
- j!=0, pat[4]==pat[1] => i=5, j=2 pat[5]!=pat[2] => next[5]=2 **
"c" failed but had "a" before so should see if text has "ab" so later
would check position 2: "b"
Boyer-Moore - 1
- Shift maximum of 2 heuristics:
- 1) Match
heuristic
- Shift right so match all chars already matched; have a new char at
match-check position
- Example: abracacabra
- ddhat[j] = min{s+m-j | ...} where s=skip amount
- a: no match so shift 1 to try the "r"; 1=1+11-11
- r: matched "a" so shift 3 to see if have "da"; 4=3+11-10
- b: matched "ra" so shift fully; 12=10+11-9
Boyer-Moore - 2
- 2) Occurrence heuristic
- Given text char X that didn't match pat, find rightmost place in
pat where X occurs, to try again, and return distance from right to
shift
- Example: abracacabra
- "a" is 0 from right, "b" is 2 in,
- "c" is 6 in from right, . . .
Simplified Boyer-Moore
- Boyer-Moore with only occur. heur.
- Why?
- Patterns are not periodic
- Space is less
- Thus should be faster on average
- Actually, performs slightly less well than Boyer-Moore (see tests)
Boyer-Moore-Horspool
- Let X = last char in pattern; find T = char in text at
X's position; use T to lookup skip in heuristic table
- Boyer-Moore-Horspool-Sunday:
- go back to left to right checking
- only use occurrence test but shift so char of text is at pat[m+1]
- BMH gives best empirical results
Shift-Or - 1
Preprocessing:
- Let pat= pattern, A=pattern alphabet
- Each entry "a" in A is represented by a |pat| bit string,
indicating where "a" occurs in the pattern
Searching:
- Initial state = |pat| 1's
- Match=0 state: (state<<1)OR T[char]
Shift-Or - 2
Advantages:
- Easy to implement in hardware
- Only need to store 1 char. of text (no buffering or text storage)
- Can handle regular expressions: sets of chars, don't cares
- Extends to don't cares or errors in text
Karp-Rabin
- Compute signature fn of each possible m- character
substring, compare with pat's signature, re-check strings to be sure
(false matches are very rare)
- Could thus pre-process text if have many searches
- Signature at position i given ito signature at position i-1
Summary
- Use Boyer-Moore-Horspool usually
- Use Naive Alg. if |pat| < 3
- Use KMP if alphabet is large
- Use Shift-Or for complex regular expressions