Computer Exercise

There is a file of sample data which has the following format:

  1. the first row has an integer indicating the number of documents;
  2. then there are that many rows of data;
  3. each row represents part of the lower left triangle of a symmetric matrix, which has as many rows as columns;
  4. each row ends with 1.0 for the diagonal entry;
  5. each entry is a real-number similarity in the range [0,1].
This represents the pairwise document similarities for a collection.

If you elect to cluster by hand, use the given similarity matrix to feed into your algorithm. Explain your steps, and draw or sketch or provide tables that illustrate the partial and final results.

Alternatively, you can write, test, and turn in the code and output for a program that will implement one of the clustering algorithms given in the textbook. Please be sure to document the use of the routine, and how it works. Please refer to pages in the textbook upon which you base your processing.

A final choice is to find and use a software package that supports clustering. Try, for example, the JMP package, which may run on the Mac systems in the lab. Use the data described above, and turn in a description of the system you used, the algorithm(s) chosen, the steps carried out, and the final results.


fox@cs.vt.edu
Oct 22 1996