LEBO88 Article Summary, Unit KB, Group 5, 5604n From: (Group 5) Shirley Carr Mike Joyce Zakia Khan Vas Madhava Article Summary (LEBO88): The Use of Memory in Text Processing Natural Language Programs should get smarter as they read more and more text. But, unfortunately, they don't. Thus the implementor is forced to hard-code all the information. This article describes RESEARCHER, an experimental system that reads patent abstracts and builds long term memory. Traditionally, ambiguity has been resolved by semantic characteristics. But this has problems: 1) the disambiguating information is static and cannot change and 2) the semantic categories are ad-hoc. RESEARCHER's use is to resolve ambiguity by use of long term memory. The method it uses is to ask questions of the memory. Most of the questions relate to getting relationships between nouns. RESEARCHER resolves the relationship by first looking at prior examples in memory and if none are found it uses general principles, which are also based on incoming text. The authors assert that all ambiguities are of five types, as shown on figure 2. (p. 1486). An assumption they make is that processing is linear and parallel with the interaction between the various levels occurring at defined points. RESEARCHER processes patent abstracts by 1) using basic syntax rules to identify objects in the ultimate representation and 2) combining these objects by adding appropriate rules. It uses only the very basic syntax information. It breaks up text into two types of segments: 1) Those that describe physical objects. These are typically noun phrases and are called "memettes", for their representation in memory. 2) Those that relate objects together. This is done in two phases: a) Memette identification Here identification of noun phrases is done to determine whether they've been mentioned before. A "save and skip" strategy is used. The most difficult part is determine the relationship between the various words. b) Memette relation. Here memettes are put together to build a final representation, all in short term memory. Three kinds of relationships bind them: i. Objects being components of other objects. ii. Physical relationships between memettes (one is above another) iii. Functional relationships between memettes (one activates another) This is by no means simple. Lots of ambiguity exists especially for complicated texts. Use of memory is optimized. When short term memory is processing a disambiguity the system looks through long term memory for relationships. Thus as memory increases, better results can be gotten. Once relationships have been identified, the issue becomes which one is most plausible. The rule used is to start with the most general relationship and go down to the specific. Matching objects is not a trivial task, it takes a lot of smarts. One of the big negatives with this memory based approach is that often a relationship can't be found when a simple heuristic would have revealed it. Another big area of concern is what to do if the data in the memory is incorrect. How can it be fixed. The article concludes with an example of RESEARCHER in use. ====================================================================== KB Article Summaries by Group I: Fitzgerald, Kalafut, Klein, and Muhlenburg. "The Use of Memory in Text Processing" by Michael Lebowitz Natural language processors do not "learn" from reading, or more specifically, processing text. RESEARCHER is a newer natural language processor, specific to patents, that learns by using dynamic memory. It builds a generalization-based long term m emory, aka a knowledge base. RESEARCHER uses memory for understanding and performs disambiguation by asking questions of its memory. RESEARCHER processes patent abstracts by using basic syntactic rules to identify objects, more specifically memettes, and combining these objects with 3 types of relationships - componential, physical, and functional. RESEARCHER uses memory to deternine which stored objects refer more plausibly to a new object. It also looks through memory for previous cases of X being related to Y in some way. Memory ends up becoming a set of generalization hierarchies. When m ore than one relevant example exists in memory, usually the more general is used. There is also the question of objects meaning different things within different domains. The P58 example being run through RESEARCHER adds to the conclusion that using mem ory to derive relations is much more satisfying and robust than using ad hoc disambiguation rules or heuristics.