Trish Heiman Summary of Monday's class (9/11) I would like to have spent more time on Inverted file systems. I am a little behind in my reading; hopefully the book will clarify. Pat Trees seem to provide a logical way of organizing ?addresses? with as little memory as possible. Hopefully next week's class will clarify the connection to document storage and retrieval. ... I thought the debates for the Digital Library unit were valuable. However, the groups need to be separated; it was just too noisy and distracting. If we debate again, I would suggest having us go off for 30-45 minutes and then come back and have general discussion with you. Also, I love the class notes on the WWW. Because of these notes and summaries, I don't feel like I have to take notes in class and therefore I can listen more closely. I would prefer more examples and discussion and less reading of the bulleted items. Maybe we could solicit examples from students and promote more class interaction. = = = = = == = = === = = = = = = = = = ==== = = == = = = == = = class summary for the week of 9/11. Carolyn O'Hare Lauren Barton Robert Ryan Martin Falck Nelson Kile We began class with a discussion of the problems with Boolean Queries. Some of the problems with Boolean Queries are that they do not allow ranking, weights on query terms or weights on document terms. The different Extended Boolean Models were examined. These models include fuzzy set theory, MMM Model, Paice Model and P-Norm Model. Fuzzy Set Theory - uses a range from 0 - 1 instead of a choice of 0 and 1 - redefines AND as MIN - redefines OR as MAX - evaluates NOT B as 1 - value(B) MMM Model - redefines AND and OR as linear combinations of MIN and MAX - AND = Cand * MIN + (1-Cand) * MAX - OR = Cor * MAX + (1-Cor) * MIN Paice Model - considers all terms in the query - uses a normalized geometric series - uses a single coefficient, r, which is 1 for AND and .7 for OR queries - sorts document terms based on weights P-Norm Model - consider all terms in query - paramerterize the strictness of each AND or OR operator with a p-value In comparison to Boolean, all the other models are more expensive than Boolean but are also more effective. Of the other Models MMM is computationally cheaper and more effective than Paice and P-Norm. In the second part of the class, we began discussing string searching and PAT trees. In string searching, the three types of text searching are cluster trees, hashing and sorted indexes. With PAT trees, we learned how to read and traverse a PAT tree. PAT trees were defined as a binary digital Patricia tree of all sistrings (semi-infinte strings) in a text. The different algorithms that are performed on PAT trees are: - Prefix searching - Proximity searching - Range searching - Longest repetition searching - Most frequent searching - Regular expression searching