Syllabus for CS5604, Fall 1995 - No. VA
General Information
- Course Name: Information Storage and Retrieval
- Course Number: cs5604
- Index Number: 0510
- Location: Room 436 usually, else in computer lab
- Prerequisites: CS2604 or permission of instructor
- Time: Mon. 6:30-9:00pm
Catalog Description for CS5604
Analyzing, indexing, representing, storing, searching, retrieving, and
presenting desired information.
Models, document processing, thesauri, evaluation of system
effectiveness, special hardware.
Boolean logic and inverted file systems. Fully automatic systems. Role
of probability, artificial
intelligence and computational linguistics.
Additional Course Description for CS5604
Explanation/demonstration of: online bibliographic services; library
systems like VTLS, MARIAN;
retrieval systems like WAIS, CODER; hypertext systems like WWW, Hyper-G,
KMS; digital multimedia with JPEG, MPEG;
applications of artificial intelligence in knowledge bases and information
systems; text processing, electronic publishing, automatic indexing.
Course theme: "digital libraries."
Objectives for CS5604
Prepare students to:
- Commence Masters and/or PhD-level research in the broad area of
information storage and retrieval (IS&R).
- Read and understand research contributions in this area.
- Critique, contrast, compare, and evaluate the efficiency,
effectiveness, and utility of
commercially available and research prototype systems for IS&R.
- Select, implement, or design and develop algorithms and data
structures for IS&R systems, including digital libraries.
- Effectively use indexing, analysis, search,
hypertext and multimedia access
systems --- for common tasks.
- Communicate effectively using writing and hypertext techniques
to demonstrate mastery of course subject matter.
Grading for CS5604
Attention to reading, labs, demonstrations and class discussions is
imperative.
Students must
demonstrate mastery of a body of knowledge and its application.
The course has 11 units, worth a
total of 135 points. Units CL, DL, IN, IR, SD, SS each are worth 10
points; units HT, IF, KB, MM,
RR each are worth 15 points. You must demonstrate mastery (e.g., quiz
grade of 90%) of each unit
you wish credit for, and will be allowed two retries (against different
questions) or an oral exam on
each unit.
Mastering a unit gives you full credit as long as you have
also completed (possibly in a
group) all assignments for that unit. The open book, open notes final
is worth 100 points and will
have at least 50 points worth of questions taken from unit tests.
Extra credit is given for high quality work on writing assignments. For each
of the debate topics, the discussion statement judged by the class to
be best earn all involved group members one bonus point.
Similarly, the three best
daily summaries from each class earn their writers a bonus point; students
can gain a maximum of 5 points during the semester this way. Finally, the
instructor may elect to give a bonus point for an exceptionally well done
exercise reply, suggestion, annotation, or similar contribution.
A grade of
- B will be given for an accumulation of at least 210 points,
- B+ for 216,
- A- for 222, and
- A for 228.
The
instructor reserves the
right to adjust grades for unusual performance on the final. There may
be extra credit assignments.
No. VA students get a 10 point bonus if they master working
with the VTEL system. This can be accomplished in groups by
helping turn on the system, fixing things if there are problems,
and/or re-assigning the camera/monitor arrangement as requested.
The Virginia Tech Honor Code
The Honor Code will be strictly enforced in this course.
All assignments submitted shall be considered graded work,
unless otherwise noted.
All aspects of your coursework are covered by the Honor System.
Any suspected violations of the Honor Code will be promptly reported to
the Honor System.
Honesty in your academic work will develop into professional integrity.
The faculty and students of Virginia tech will not tolerate any form
of academic dishonesty.
Instructor Information
- Home: 203 Craig Dr., Blacksburg, VA 24060;
+1-540-552-8667
- Email: fox@vt.edu
- WWW: http://fox.cs.vt.edu/
- Title: Professor
- Room: 608 McBryde Hall
- Office Hours: Tu 12:30-3:30pm, Th 12:30-2:30pm
- Phone: +1-540-231-5113
- FAX: +1-540-231-6075
- Address: 660 McBryde Hall, Virginia Tech, Blacksburg,
VA 24061-0106
Textbooks for CS5604
-
Required:
Frakes, William B. and Ricardo Baeza-Yates, editors.
Information Retrieval: Data Structures & Algorithms.
Englewood Cliffs, NJ: Prentice-Hall, 1992,
viii, 504. ISBN 0-13-463837-9.
Call Number QA76.9.D35 I543 1992
[Skip ch's 4,6,9,13,17-18.]
-
Recommended:
Salton, Gerard.
Automatic text processing: the transformation, analysis,
and retrieval of information by computer.
Reading, Mass.: Addison-Wesley, 1989,
xiii, 530.
Call Number QA76.9.T48 S25 1989
Readings and References for CS5604
Some of the best hypertext readings are available through KMS
in the ACM Hypertext Compendium which starts at frame
ACMHTCtop1
on video.cs.vt.edu or are pointed to by the excellent bibliography
available from Texas A&M University which starts at frame
tamuhrl92001.1
on video.
Aside from the textbook, all other materials mentioned have been
placed on reserve. These are all valuable items for the study of
Information Storage and Retrieval.
Articles for CS5604
The following articles are required or recommended reading to go along
with the various units of the course, much like the various selected
chapters of the textbook.
They can be found on reserve in the Library, or can be read online using
xtiff or xprcedit when referring to the proper files on
video.cs.vt.edu in
/u4/pages/acm/cacm with pathname suffixes given below. Note that
/u4/pages has a large number of articles from
ACM and IEEE-CS that have been scanned in and can be referred to.
-
COOM87: v30/n11/COOM87
Coombs, Renear, and DeRose. Markup
Systems and the Future of Scholarly Text Processing.
CACM 30(11):933-947, Nov. 1987.
-
DONG87: v30/n05/DONG87
Dongarra and Grosse. Distribution of Mathematical Software via
Electronic Mail.
CACM 30(5):403-407, May 1987.
-
FOXE88d: v31/n08/FOXE88d
Fox. ACM Press Database and Electronic Products -- New Services for
the Information Age.
CACM 31(8): 948-951, Aug. 1988.
-
FOXE91b: v34/n04/FOXE91b
Fox. Standards and the Emergence of Digital Multimedia Systems.
CACM 34(4): 25-29, April 1991.
-
FURN87: v30/n11/FURN87
Furnas, Landauer, Gomez and Dumais. The Vocabulary Problem in
Human-System Communication.
CACM 30(11): 964-971, Nov. 1987.
-
GREE92: v35/n01/GREE92
Green. The Evolution of DVI System Software.
CACM 35(1): 52-67, Jan. 1992.
-
HAAN92: v35/n01/HAAN92
Haan, Kahn, Riley, Coombs, and Meyrowitz. IRIS Hypermedia Services.
CACM 35(1): 36-51, Jan. 1992.
-
LEBO88: v31/n12/LEBO88
Lebowitz. The Use of Memory in Text Processing.
CACM 31(12): 1483-1502, Dec. 1988.
-
MALO87a: v30/n05/MALO87a
Malone, Grant, Turbak, Brobst and Cohen. Intelligent
Information-Sharing Systems.
CACM 30(5):390-402, May 1987.
-
MAMR87: v30/n05/MAMR87
Mamrak, Kaelbling, Nicholas and Share. A Software Architecture for
Supporting the Exchange of Electronic Manuscripts.
CACM 30(5):408-414, May 1987.
-
NIEL90a: v33/n03/NIEL90a
Nielsen. The Art of Navigating Hypertext.
CACM 33(3): 296-310, Mar. 1990.
-
PHIL91a: v34/n07/PHIL91a
Phillips. MediaView: A General Multimedia Digital Production System.
CACM 34(7): 75-83, July 1991.
-
SALT75b: v18/n11/SALT75b
Salton, Wong and Yang. A Vector Space Model for Automatic Indexing.
CACM 18(11): 613-620, Nov. 1975.
-
SALT83d: v26/n11/SALT83d
Salton, Fox and Wu. Extended Boolean Information Retrieval.
CACM 26(11):1022-1036, Nov. 1983.
-
SALT86a: v29/n07/SALT86a
Salton. Another Look at Automatic Text-Retrieval Systems.
CACM 29(7):648-656, July 1986.
-
SAMU91a: v34/n10/SAMU91a
Samuelson. Digital Media and the Law.
CACM 34(10):23-28, Oct. 1991.
-
WALL91: v34/n04/WALL91
Wallace. The JPEG Still Picture Compression Standard.
CACM 34(4): 30-44, April 1991.
Units of CS5604
There are 11 units in this course, each with a 2-letter ID that
symbolizes the main topical area considered. Each unit will be covered
in 1-2 weeks of class time, has a set of associated readings, and has some
lab or homework exercises that must be completed.
Either 10 or 15 points, depending on the time and difficulty of the
unit, will be given when mastery is demonstrated by a quiz
grade of at least 90%.
- DL: Digital Libraries
FOXE88d, SAMU91a, DONG87
8/21 (10 points)
- IR: Information Storage & Retrieval
Ch1, SALT86a, Ch2
8/28 (10 points)
- IF: Inverted Files / Boolean Systems
Ch3, Ch12, Ch15, SALT83d
9/4, 9/11 (15 points)
- SS: String Searching
Ch5, Ch10
9/11, 9/18 (10 points)
- RR: Ranking / Relevance Feedback
Ch14, SALT75b, Ch11
9/18, 9/25 (15 points)
- CL: Clustering
Ch16
10/2 (10 points)
- IN: Indexing / Document Analysis
Ch7, Ch8
10/9 (10 points)
- SD: SGML / Document Translation
COOM87, MAMR87
10/16 (10 points)
- HT: Hypertext
NIEL90a, HAAN92
10/23, 10/30 [videotapes] (15 points)
- MM: Multimedia
FOXE91b, WALL91, PHIL91a, GREE92
11/6 [library, videotapes], 11/13 (15 points)
- KB: Knowledge-Based Information Retrieval
FURN87, LEBO88, MALO87a
11/13, 11/27 (15 points)