CS5014: Homework 3

Due Friday, 15 September

Please turn in a hardcopy of the solution to this homework, written using LaTeX. You may not receive full credit if you do not staple together the pages of your homework.

  1. Consider Table 4 of the Abrams, Standridge, Abdulla, Williams, and Fox paper from HW2 (html version; postscript). Compute all indices of dispersion from section 12.8 for the 6 values representing lifetime of a document in the cache for the classroom workload and the LRU replacement policy. Which index would you choose, and why? (Note: We did not discuss indices of dispersion in class, but I trust that you can read and apply sections 12.8 and 12.9 in Jain on your own. Also refer to problems 12.13 and 12.14 in Jain if you are unsure of your answer -- the answers are in the back of Jain!)

  2. Repeat the previous problem, but for index of central tendency. (Refer to problems 12.8, 12.10, and 12.11 in Jain if you are unsure of your answer.)

  3. Do problem 12.15 in Jain. Use gnuplot to generate an encapslated postscript file of the graph, and include the graph in your LaTeX document.

  4. Consider a data set with n samples, denoted XS(1),...,XS(n). For Q-Q plots, why is it inconvenient to have an empirical distribution function F_S(x) such that F_S(XS(n)) = 1?

  5. In class, we defined a QQ plot to test if a continuous theoretical distribution fit empirical data. What difficulty arises when you try to define a Q-Q plot to test if a discrete theoretical distribution fits empirical data?