In the AI field, one key area is natural language processing (NLP) and the related area of computational linguistics. This can be applied to the human-computer interface, to make IR easier for casual users, and to encourage a more flexible dialog, where user modeling, explanation, tutoring and learning can all take place.
NLP can also be applied to query and/or document analysis, leading to a detailed knowledge representation. This is faster when applied only to the query, where at least phrase identification usually occurs. More complex representations, such as conceptual graphs [4] or frame systems (as for chemical reactions) can also be developed.
Of great potential is NLP applied to the documents, so a richer representation can be developed. Once again, identifying phrases is a sensible goal, though it is still not clear that this approach is better than statistical phrase identification methods. In general, parsing for large collections must be very fast, and so may be shallow or partial, perhaps only identifying dependencies. Clearly it must be robust, immune to minor spelling or grammar errors, or to the occurrence of new words.
With richer representations, one often thinks about IR at the level of concepts. This may involve use of a thesaurus, or of AI-constructed equivalents, which map from the world of words to the world of ``concepts.'' Sometimes these are called ``topics'' or ``frames'' or a semantic network or graph representation may be used. The two readings for this unit relate here: the first illustrates the need to have many aliases for a concept, to enhance recall; the second focuses on (sense) disambiguation through the use of NLP (semantics oriented) and use of (lexical and domain) memory.
Given such representations, inference methods can help. This may include query expansion. Graph matching may be needed for complex queries, and may follow an initial screening or filtering that is based on simpler, statistically based selection. Bayesian network or rule based schemes have good potential to help during the matching.
Matching can be improved, as can the interface, through user modeling. There is often some initial setup of stereotypes, knowledge acquisition so the user can be accurately characterized, and differential analysis to compare the user with stereotypical behavior.
Finally, machine learning can be used in a variety of IR situations. The normal feedback scheme is one example. It is essential to have training data or samples to help, so that one can tune parameters. Good methods must converge quickly to (near) optimal solutions. Genetic algorithms and neural nets have been considered but it is not yet clear if and how they can help.