Boolean Model
Boolean Set Processing / Venn Diagram
Boolean Set Processing / Inverted File Results
Documents are made up of words, or other types of terms (numbers,
phrases, or other types of features).
So, we could build a matrix that has a row for each document, and
a column for each term, with matrix entries holding a
count of the number of occurrences of the term in the document.
We can compress this into an inverted file.
- Inverted file is called that since it looks at the matrix
of documents x terms in the inverse way --- by term instead of by document
(the normal case for readers).
- Extension to handle proximity takes extra space.
- More precise searching allows users to constrain word
combinations further, such as:
- adjacent
- within n words
- in same sentence
- in same paragraph
- Entries for a term might include (doc,wt) pairs plus:
- List of locations inside document as:
- Byte offset (or offsets for start/end); or
- Paragraph number, sentence number, word number; or
- Pointer into a structure tree (or pointers for a span):
- e.g., chapter no. / section no. / subsection no. / par. no.;
- e.g., reference no. / title part / subtitle field;
- e.g., dictionary headword / part of speech / sense / definition.
Online Searching in the Boolean Model
- Phases
- Clarify info. need / problem
- Identify access points: a, t, s, ...
- Identify concepts, terms
- Develop, try, adapt search strategies
- Examine results, use feedback
- Query organization
- String of pearls (DNF: [A*B*C]+[H*I*J+K]+[X*Y*Z])
- List of required concepts (CNF: [A+B]*[H+I+J+K+L])
- Concept organization building on:
- Elements: descriptors, phrases, words, stems/roots
- Relationships: Synonym, xref, bt