8/31 Summaries

NOTE: Credit given to all but the last one.

Theodore David

At the beginning of the period Dr. Fox informed us about the death of Dr. Salton. Dr. Salton has been one of the pioneers in the field of Computer Science, and especially in the area of Information Storage & Retrieval. Three of the articles that we are required to read in CS5604 are written by Dr. Salton. In his memory, Dr. Fox, went through some of his articles.

After that we went through question that students had sent to Dr. Fox about the course. This was followed by a quick review of the IITA Digital Libraries Workshop, held on May 18-19, 1995.

The professor continued with the presentation of the last part of unit Information Retrieval (IR). The Blair & Maron article was the point we stopped since last lecture. A partial Timeline of Progress in IS&R was the next topic. Some interesting algorithms were mentioned. This was followed by a Functional and Topical View of IR. For the latter, Dr. Fox explained the foundations of IS&R, and how these affect it. They are: Technology; Library & Information Science; Experimental Computer Science; and Data Structures & Algorithms.

The class ended with a brief review of the required readings for this unit.

Rick Compton

Today's class started with the reading of an article from the Ithaca Journal about Gerard Salton who passed away this week. Dr. Fox then mentioned some papers with which Salton was involved.

Next we moved into the area of questions. Quizzes should be available by tomorrow. The knowledge of UNIX will not play a large role in this class. Passwords on the fox machine may be the same as those on csgrad. If more information about netlib is needed, files can be down loaded to help. Lastly, if someone is not comfortable with the knowledge acquired from any unit's studies, see Dr. Fox personally for help.

An article from the IITA Digital Libraries Workshop is now listed in the News/Announcement section. Topics in the paper included interoperability and infrastructure requirements.

Returning to the IR unit, we considered the Blair & Maron Article Questions were raised about what 'vocabularies' should be used for indexing, whether indexing should be automatic, and the scale-up of lab results.

We examined a time line that went from superimposed coding in 1949 to PAT arrays in 1987. Dr. Fox gave detailed explanations about hashing and tries. Once again, we visited the functional view of IR and examined topical views in detail. Concerning chapter 1 of the text we looked at Domain Analysis and IR vs. Other Systems. After a glance at chapter 2 class ended.

Sadanand K. Sahasrabudhe

The class began on a rather sad note given the news that Dr. Salton has passed away 2 days ago. He was one of the leading researchers in the field of IS & R and was with the Department of Computer Science at Cornell. A brief review of the some of his publications was done.

The discussion on Unit 2 was then continued. The Blair Maron controversy (which was introduced in the last class) was discussed. The controversy over the article seems more over the conclusions that were drawn. The conclusions were rebutted later by different people in different articles.

The discussion then shifted to chronological progress of IS & R. What kinds of data structures and associated algorithms were introduced, during what period in time, was discussed. The concepts of hashing and tries were discussed in detail along with their advantages and disadvantages.

The two views of the field of IS & R -- functional and topical, were then discussed. The functional view explains the different parts of an IS & R system (including the user) and how they relate to each other. The topical view explains whats differnt areas of Computer and Information and Science are covered under IS & R. It was surprising to note how all-encompassing the field was, with everything from Networking to Algorithms and AI to Languages included under it.

The class ended with a brief discussion of the first two chapters from the text book and the article by Salton. An important point was the distinction between IR systems and and otehr systems such as AI systems and Database Management Systems.

Binh-Minh Tran

Today, at the beginning of class, I learned a few things about Gerard Salton, a man who has contributed greatly to the computer science community and who just passed away in the last few days. Salton attended college at Brooklyn College. He came to work at Cornell University in 1965, when this now famous school of computer science had only a punched-card computer. Most of his work concentrated on information retrieval and intermedia systems. His collected work included 5 textbooks and more than 500 articles. Salton was also an ACM council member. He earned many awards for his contributions to computer science. I will read his reprinted article, "Another Look at Automatic Text - Retrieval Systems", in the next couple of days not only because it is one of our reading assignments but also to appreciate his work and contribution.

The next item that we discussed was the questions that some students had relating to the assignments. Dr. Fox answered every question. His answers cleared up things that I was not certain about and confirmed what I had already understood. This process is very helpful because although Dr. Fox explained those to us in previous lectures, some of us needed clarification. The answers to the questions relating to the assignments helped us pick up any missing information.

The news about the DL workshop in May was brought up next. Dr. Fox briefly went over the items covered in the workshop. I will read the news to understand more about it.

The last item, also the main one, is the IR lecture. We picked up at the Blair & Maron article partly covered last time. From the lecture, I have learned the followings:

1. Blair & Maron argue that it is NOT the case that:
   - free text is better than controlled vocabulary
   - automatic is better than manual indexing
   - things would scale-up

However, they did not have evidence to support their arguments.

2. Partial timeline of progress
   - superimposed coding are signature files
   - refreshing my memory about hashing:
     * a method to access stored information
     * use of math functions to compute the location where some 
       information is stored
     * if collision occurs, linked list will be used or other functions
       will be also be used to find other empty slots in the hash table
     * hashing gives ideal performance
     * hashing use storage efficiently
   - Tries
     * recursive tree
     * use of decomposition of strings to store and search
     * requires much more space than hashing

   Other items were briefly covered.

3. Functional view of IR
   We looked at the diagrams and went through the topic hierarchy of
   technology, library & information science, experimental computer
   science, and data structures and algorithms

Chapter 1, chapter 2, and the Salton article were briefly introduced. They are our reading assignments. Chapter 2 should be read to refresh our memories and will not be covered in class.

Sirirut Vanichayobon

In the beginning of the class, Dr. Fox talked about his teacher, Professor Gerard Salton, who had recently passed away. After that, we continued studying unit IR. He asked the students about hashing. Noone gave a good answer, so he explained in the easy style with the picture that I can imagin and remember. Besides, he explained about how to take a quiz. The students have to tell their names and id when they send their answers.