Class Summary, Sep 14 -- Srinivas R. Gaddam Applications of PAT trees and their algorithms were discussed. Prefix searching involves searching for all strings that have a particular string as a prefix. This involves searching the prefix in the tree upto a point where we exhaust the prefix. Then, all the sistrings in the subtree is the required answer. Proximity searching involves finding all places where a string is atmost a fixed number of characters away from another string. Range searching involves searching for all strings that fall in between two specified strings in lexigraphic manner. Longest repetition searching is the match between two different positions of text where this match is longest in the entire text. This is given by the tallest internal node in the tree. Most frequent searching involves searching strings that occur most frequently. Regular expression searching uses the concept of Finite Automaton (specifically Deterministic Finite Automaton or the DFA). Method of how to build Patricia trees was discussed. PAT arrays were also discussed. They take log N time more than a PAT tree. The second part of the unit, String Matching, was then discussed. Naive Algorithm is the brute force solution. The time taken is of the order of O(mn). The KMS algorithm does some preprocessing of the pattern in time O(m). There is an upper bound on the number of comparisons. The reason is that every time a search is to be done, it makes use of the information obtained by the previous search and eliminates repetition of mismatch searching. Boyer-Moore algorithm searches from right to left in the pattern. If no mismatch occurs then the pattern has been found. Otherwise, a shift is done by some amount and repeated. There are 2 heuristics to compute this shift. The Karp-Rabin algorithm uses the concept of hashing for string matching. Signature function of each possible m-character substring in the text is computed and compared with that of the pattern. This takes order of O(n) time. Then, unit 5, Ranking and Relevance Feedback, was introduced. From characters and strings, we now move on to words and concepts in this unit. An important observation is that Set of words and their statistics are responsible for a significant dent in the information content. Issues to be addresses here are -- how to assign weights, how to do a query expansion. One way is to refer thesaurus.