CALL FOR PARTICIPATION
TEXT RETRIEVAL CONFERENCE
January 2000 - November 2000
Conducted by:
National Institute of Standards and Technology (NIST)
With support from:
Defense Advanced Research Projects Agency (DARPA)
The Text Retrieval Conference (TREC) workshop series encourages research in information retrieval from large text applications by providing a large test collection, uniform scoring procedures, and a forum for organizations interested in comparing their results. Now in its ninth year, the conference has become the major experimental effort in the field. Participants in the previous TREC conferences have examined a wide variety of retrieval techniques, including methods using automatic thesauri, sophisticated term weighting, natural language techniques, relevance feedback, and advanced pattern matching. Other related problems such as cross-language retrieval, retrieval of recorded speech, and question answering have also been studied. Details about TREC can be found at the TREC web site, http://trec.nist.gov .
You are invited to participate in TREC-9. TREC-9 will consist of a set of seven parallel tasks known as "tracks". Each track focuses on a particular subproblem or variant of the retrieval task as described below. Organizations may choose to participate in any or all of the tracks. For most tracks, training and test materials are available from NIST; a few tracks will use special collections that are available from other organizations for a nominal fee. For all tracks, NIST will collect and analyze the retrieval results.
Dissemination of TREC work and results other than in the (publicly available) conference proceedings is welcomed, but the conditions of participation preclude specific advertising claims based on TREC results. As before, the workshop in November will be open only to participating groups that submit results and to selected government personnel from sponsoring agencies.
Schedule:
---------
By February 1, 2000 -- submit application described
below to NIST.
Returning an application will add you to the active
participants' mailing list. On Feb 1, NIST will
announce a new password for the "active participants"
portion of the TREC web site. Included in this portion
of the web site is information regarding the permission
forms needed to obtain the TREC document disks.
Beginning February 8 -- document disks distributed to those new
participants who have returned the required forms. There
is a total of 5 CD-ROMS containing about 5 gigabytes of
text. In addition, 450 training topics (questions) and
relevance judgments are available from NIST. Please
note that no disks will be shipped before February 8.
August 2 -- earliest results submission deadline.
August 30 -- latest results submission deadline.
(Results deadline will vary by track. The Web track
deadline will be August 2. Deadlines for other tracks
are still to be determined, but will be sometime in
August.)
September 7 -- speaker proposals due at NIST.
October 5 -- relevance judgments and individual evaluation
scores due back to participants.
Nov. 13-16 -- TREC-9 conference at NIST in Gaithersburg, Md.
Task Description:
-----------------
Below is a brief summary of the tasks. Complete descriptions of tasks performed in previous years are included in the Overview papers in each of the TREC proceedings (in the Publications section of the web site).
For most tracks, the exact definition of the tasks to be performed in the track for TREC-9 is still being formulated. Track discussion takes place on the track mailing list. To be added to a track mailing list, send a request to the Mailing List Contact Address listed below. For questions about the track, send mail to the track coordinator (or post the question to the track mailing list once you join).
Cross-Language Track -- a track that investigates the ability of
retrieval systems to find documents that pertain to a topic
regardless of the language in which the document is written.
In previous TRECs, the cross-language track involved
documents
written in English, German, French, or Italian. Starting in
2000,
the investigation of cross-language retrieval for European
languages
will have its own evaluation known as CLEF (for
Cross-Language
Evaluation Forum). More details about CLEF can be found at
the CLEF
web site, http://www.iei.pi.cnr.it/DELOS/CLEF .
In TREC-9, the cross-language track will use English and
Mandarin
documents and English topics. Depending on data
availability,
the track may also involve Tamil and Malay documents.
Track coordinator: Donna Harman, donna.harman@nist.gov
Mailing list contact address: erika.ashburn@nist.gov
Filtering Track -- A task in which the user's information need is
stable
(and some relevant documents are known) but there is a stream
of new documents. For each document, the system must make a
binary decision as to whether the document should be
retrieved
(as opposed to forming a ranked list).
Track coordinators: David Hull, david.hull@xrce.xerox.com and
Steve Robertson, ser@microsoft.com
Mailing list contact address: lewis@research.att.com
Interactive Track -- A track studying user interaction with text
retrieval
systems. This year's track will use the Web document
collection and a
task similar (but not identical) to the Question Answering
Track.
All participating groups follow a common experimental
protocol
that provides insights into user searching.
Track coordinator: Bill Hersh, hersh@ohsu.edu
Mailing list contact address: hersh@ohsu.edu
Query Track -- A track designed to foster research on the effects
of
query variability and analysis on retrieval performance.
Each
participant constructs several different versions of existing
TREC topics. All groups then run all versions of the topics.
Track coordinator: Chris Buckley, chrisb@sabir.com
Mailing list contact address: chrisb@sabir.com
Question Answering Track -- A track designed to take a step
closer
to *information* retrieval rather than *document* retrieval.
For each of a set of 500 questions, systems produce a text
extract that answers the question. Different runs will
have different limits on the maximum length of the extract,
including a short phrase (a few words), 50 bytes, and 250
bytes.
Track coordinators: Amit Singhal, singhal@research.att.com
and
Tomek Strzalkowski, strzalkowski@crd.ge.com
Mailing list contact address: singhal@research.att.com
Spoken Document Retrieval Track -- A track that investigates the
effects of speech recognition errors on retrieval
performance.
The task to be performed in TREC-9 is still to be determined.
Please contact the track coordinator as soon as possible if
you are interested in this track.
Track coordinator: John Garofolo, john.garofolo@nist.gov
Mailing list contact address: john.garofolo@nist.gov
Web Track -- A track featuring ad hoc search tasks on a document
set that is a snapshot of the World Wide Web. The main focus
of the track will be to form a Web test collection using
pooled
relevance judgments. The document set will be a 10GB
subsample
of the existing VLC2 document set. Topics will be created at
NIST by taking queries from search engine logs and
retro-fitting
topic statements around them. (Thus, the true web query will
be there, but there will also be a narrative explaining how
it
will be judged.) Relevance judgments will then be made using
the traditional TREC pooling methodology, with NIST assessors
doing the judging.
Track coordinator: David Hawking, David.Hawking@cmis.csiro.a
u
Mailing list contact address: David.Hawking@cmis.csiro.au
Conference Format:
The conference itself will be used as a forum both for
presentation of
results (including failure analyses and system comparisons), and
for
more lengthy system presentations describing retrieval techniques
used,
experiments run using the data, and other issues of interest to
researchers in information retrieval. As there is a limited
amount
of time for these presentations, the program committee will
determine
which groups are asked to speak and which groups will present in
a
poster session. Groups that are interested in having a speaking
slot
during the workshop will submit a 200-300 word abstract in
September
describing the experiments they performed. The program committee
will
use these abstracts to select speakers.
As some organizations may not wish to describe their proprietary
algorithms, TREC defines two categories of participation.
Category A: Full participation Participants will be expected to present full details of system algorithms and various experiments run using the data, either in a talk or in a poster session.
Category C: Evaluation only Participants in this category will be expected to submit results for common scoring and tabulation, and present their results in a poster session. They will not be expected to describe their systems in detail, but will be expected to report on time and effort statistics.
Data:
The existing TREC English collections (documents, topics, and
relevance
judgments) are available for training purposes and will also be
used
in some of the tracks. Parts of the training collection (Disks
1-3)
were assembled from Linguistic Data Consortium text, and a signed
User
Agreement will be required from all participants. The documents
are
an assorted collection of newspapers, newswires, journals, and
technical
abstracts. A separate Agreement is needed for the remaining
disks (4-5).
All documents are typical of those seen in a real-world situation
(i.e. there will not be arcane vocabulary, but there may be
missing
pieces of text or typographical errors). The relevance judgments
against which each system's output will be scored will be made by
experienced relevance assessors based on the output of all TREC
participants using a pooled relevance methodology.
Response format and submission details:
Organizations wishing to participate in TREC-9 should respond to
this
call for participation by submitting an application. An
application
consists of four parts: contact information, a one-paragraph
description
of your retrieval approach, whether you will participate as a
Category A
or a Category C group, and a list of tracks that you are likely
to
participate in. Contact information includes a full regular
address,
voice and fax telephone numbers, and an email address of the one
person
in the organization who will be the main TREC contact. Please
note
that email is the only method of communication in TREC.
Participants
in TREC-8 who will participate in TREC-9 should also submit an
application.
All responses should be submitted by February 1, 2000 to Ellen Voorhees, TREC project leader, at ellen.voorhees@nist.gov . Any questions about conference participation, response format, etc. should be sent to the same address.
Program Committee ----------------- Ellen Voorhees, NIST, chair James Allan, University of Massachusetts, Amherst Nick Belkin, Rutgers University Chris Buckley, Sabir Research, Inc. Jamie Callan, Carnegie Mellon University Susan Dumais, Microsoft Donna Harman, NIST David Hawking, CSIRO, Australia Bill Hersh, Oregon Health Sciences University Darryl Howard, U.S. Department of Defense David Hull, Xerox Research Center Europe John Prange, U.S. Department of Defense Steve Robertson, Microsoft Amit Singhal, AT&T Labs Research Karen Sparck Jones, Cambridge University, UK Tomek Strzalkowski, GE Ross Wilkinson, CSIRO, Australia