IN - Affix Removal
This method is based on linguistic concepts, namely roots and affixes, and, in particular in English, suffixes.
Usually languages have morphological rules dealing with addition of suffixes to root forms, to deal with conversions between parts of speech (e.g., govern, government), changes to plural form (e.g., form, forms), or deriving new words that are extensions of other words (e.g., air, airline, airliner).
Lovins Method - Longest Match
- Longest match: remove the longest possible string, according to set of rules (e.g., generations)
- Iterative longest match: remove that string in steps (e.g., suffix at a time, so generations becomes generation and then generate)
- Partial matching: handles conflation problems at match time by only comparing prefix (e.g., match sk between sky and ski (that came from skies))
- Recoding: fixes conflation problems by correcting, using a set of rules for transformation (e.g., so i becomes y as from skies, ll becomes l, p becomes b)
Porter Algorithm
- Porter, Van Rijsbergen, ...
- Compact, simple, but relatively accurate
- Condition/action rules: on stem, on suffix, on rules
- Stem condition: assume word form [C](VC)m[V]
- m=0: TR, EE, TREE, Y, BY
- m=1: TROUBLE, OATS, TREES, IVY
- m=2: TROUBLES, PRIVATE, OATEN
- Other stem conditions: has a vowel, ends with letter,ends with double consonant
- Suffix: pattern (sses, ies, ss, s; eed, ed; ing; ate, tion, ence, ance, ...)
- Replacement: sses to ss, ies to i, s to NULL, eed to ee, enci to ence, ...