CS 5014 -- Term Project -- Fall 1995

Version 1

Work in groups of preferably three to complete the following assignment.

All items that you turn in (except for the literate program) must use a 12 point font. All items that you turn in except for the literate program must be double spaced. All items that you turn in be prepared with LaTeX or TeX.

Milestone 1 -- HW10, Problem 2 (to be done individually):

Perform a literature search and identify one algorithm to perform compression on a plain ASCII text file that you believe you can implement. Send an e-mail message to keller@csgrad.cs.vt.edu, with the subject line "HW10 Citation," containing

  1. a citation giving the state of the algorithm (e.g., in a book, a journal, or in a URL in the WWW),

  2. the name of the algorithm (e.g, Huffman compression, LZ78), and

  3. whether the citation contains partial or full source code implementing the compression method.

(Unless too many students select the algorithm that you select, you will implement this algorithm as a literate program in HW11 as the first stage in your term project.) Choose an algorithm that meets the following two criteria:

We will allow at most five people to implement the same algorithm. Therefore the GTA will reply by e-mail telling you if you are one of the first five persons choosing the citation. If you are not, you must choose another paper and repeat this procedure.

Milestone 2 -- HW11 [to be done individually]:

Write a literate program implementing the compression algorithm that you proposed and was approved by the GTA. The compression program must be named "c5014" and must take one command line option, which is the name of the file to compress. If the command line is


c5014 myfile

then the program must create compressed file myfile.c. Program c5014 must not delete the original file.

Create a documented test suite that you use to verify its correctness and a brief explanation of how to use the program. Test your program on both csgrad and sequoia, and test with two different compilers, and that test with all possible optimization options. This will minimize the chance that a subsequent user could break your program. Write the test suite with the expectation that one day someone will modify your code and will wish to run it though the test suite again. Describe the inputs and the expected outputs (thereby permitting use of regression testing). Devise a good test strategy, or look in a book on software engineering if you know of no test strategies. Turn in the following:

  1. a hardcopy of the literate program

  2. an electronic copy: In directory /actor/export/abrams2/cs5014 on csgrad and other DECstations, create a subdirectory whose name is your last name, first initial (abramsm for the instructor), and then put whatever files (including a makefile, if used) in the directory that you create.

The "documented test suite" should give all test cases that you used to convince yourself of why your program is correct, along with (a) a justification for why you chose the data and (b) the expected output for each input value.

Here are some example questions to consider in formulating your test suite:

Does your test suite exercise all possible paths through your program? (Explain why the test data will do this.) Does the test suite exercise any boundary condition? Did the paper or book in which you found the algorithm exploit any special properties of the input file? If so, this may suggest other values to include in your test suite.

Milestone 3 -- HW12:

Imagine that upon graduation from Virginia Tech you start a consulting firm. Your first client hires you to evaluate a set of algorithms and prepare a report that concludes with recommendations characterizing which algorithm works the best for which input data values.

The instructor will give the literate programs of nine students to each group of three in the class. (Different students will receive different programs.) The nine programs will implement three different algorithms. Design and carry out a computational experiment to compare the performance of the three algorithms. Pay particular attention to distinguishing conclusions about the programs versus the algorithms. Because you will have three implementations of each algorithm, you can design experiments to identify whether a property is common to all implementations (and thus is likely to be a property of the algorithm) or not.

For example, if program A is measured to run in half the time of program B on the same input data, does this mean that algorithm A is twice as fast as algorithm B? The answer is "yes" only if all implementations of algorithms A would run half as fast as algorithm B. Turn in an experiment design description which does not exceed 4 pages in length, including tables and figures, and an outline of no more than two pages.

Your experiment design description should list:

Milestone 4 -- due on last class meeting:

Carry out the experiments. Write a paper whose length does not exceed 20 pages, excluding figures, in a style appropriate for IEEE Transactions on Computers (but use one column format). The paper should describe the algorithms, your computational experiment, and the results of your comparison. Be careful to distinguish differences between the algorithms themselves and differences due to implementations of the algorithms. The paper should draw conclusions on which algorithm is best for which types of data sets. The conclusions of your study in your research paper should be one or more statements similar to:

"At a __% confidence level, the experiments lead to the following conclusions: Algorithm A performed better than algorithm B under the following conditions: ... Algorithm B performed better under the following conditions: .... Algorithms A and B did not perform differently under the following conditions..."

If you use different experiment designs for the different algorithms, then use the methods in Chapter 13.4 of Jain to calculate confidence intervals to compare the algorithms. If you do not consider workload, the choice of compiler, or the use of compiler options as primary factory, you must give quantitative evidence of why these factors did not *bias* the conclusions above.