Class Summary for 9/18 from Group #5: Lauren Barton Nelson Kile Caroline O'Hare Martin Falck Robert Ryan The first part of class we reviewed Part B of String Searching Section. In this section we discussed various methods of string matching, such as: Naive Algorithm: search is sequential (and bit by bit). The largest number of comparisons can equal m x n where m = length of string s1 and n= length of string s2. Used when string is less than 3 chars. Knuth-Morris-Pratt: One pointer advances in the text the other pointer advances in the pattern being searched for. Used when alphabet is large. Boyer-Moore: takes advantage of matches within the pattern being searched for. Boyer-Moore-Horspool: Is used most often Shift-Or: used for comparing regular expressions Karp-Rabin: Uses hashing for string matching The second part of the lecture reviewed the first three sections of the RR Unit: We discussed the basic concern of improve the retrieval process to bring back a higher success rate in precision (or the retrieval of relevant documents). Ranking is a process used to retrieve the documents with a higher probability of relevance. Certain features can be used to rank documents, such as term weighting. One model in particular was examined which is used to rank documents. This is the Vector Space Model. In this model documents are ranked based on their similiarity to each other. Feedback is used to tune query and provide even more relevant feedback. A second model examined was the Probabilistic Model. This model is based on the assumption that terms should be given greater weights if they appeared in documents retrieved from a previous query. So if a user is attempting to locate documents on a particular topic it makes sense to assume that the documents retrieved in an earlier query are related to the present query. = = = = = == = = === = = = = = = = = = ==== = = == = = = == = = Tom Kalafut 09/18/95 class summary We glanced over several algorithms for string searching with emphasis on why or when you would each type of algorithm. We then started some relevance ranking material including sub-vectors, query moving, and probabilistic algorithms.