language barriers
Increasingly, vast collections of data are becoming available in languages
other than English. In most cases, no translations are available.
But full-text and fielded searching is practical for most non-English document
collections. By using a language dictionary, a query could be translated
to the language of the remote site, allowing a person to locate useful
information without having any knowledge of the language in which the document
is written. Then the researcher can work with translation software
or a human translator to translate specific documents determined to be
relavent.
search option variations
Beyond exact match searching there are many mechanisms employed by
search sites that can be used to improve the precision of a search.
Among them are word truncation, boolean operators, field level operators,
proximity and range searches. These mechanisms can be implemented
in different ways on different systems. Also, some types are not
valid on all types of queries. For example, full text search sites
may not contain any information about fields within the documents it indexes.
A range search might be useful when trying to locate documents with
a specific date
result set variability
Result sets are often presented in radically different formats, with
and without information explaining why an item was included or how they
were ranked. Sometimes, results are presented in small blocks, sometimes
advertisements and other unrelated materials are included. Merging
result sets is an enormous problem.
differing mechanisms for ranking documents
In order to merge result sets, you must have results sets that rank
items similarly, and order the results list similarly, or you need to modify
the result set so that it conforms to a particular set of formatting and
ranking characteristics. It is often difficult to determine exactly
how a search engine selects one document over another. In fact, sometimes
this is a closely guarded trade secret.
selecting sites to search
Some users may want to search every available site, until they discover
that some sites are down and others contain nothing remotely related to
their query. Providing the user with a mechanism for preselecting
a subset of searchable indexes saves time and computing resources.
Think of it as the first step in refining a search.
determining the characteristics of a search site
Many search sites are similar in that they present one text entry field
and a button to initiate the search. But behind the scenes of many
web search forms are hidden form fields, database selection options and
even elements that define the appearance of the results set. Most
of these items have nothing to do with searching, but the search CGI program
expects to receive them nonetheless.
how to handle delays and lags in response
Because searchable sites can be anywhere in the world, response time
can vary dramatically. Since merging results is difficult and time
consuming, sites that are slow to respond or that fail to respond at all
must be abandoned within a reasonable period of time. But this needs
to be reported to the end user, along with information that will allow
them to determine whether or not to try the site again or remove it from
the list of sites to search.
where should most of the work be performed?
Because of Java and the ability to write once and run anywhere, it
is now possible to ask the desktop system to perform some of the tasks,
such as results set merging and/or query mapping. But distributing
the workload may not provide many benefits if the client is waiting on
the server which is in turn waiting on ten or one hundred servers to respond.