CS5014: Homework 5

Due Friday, 29 September

Please turn in a hardcopy of the solution to this homework, written using LaTeX. You may not receive full credit if you do not staple together the pages of your homework.

Construct a simple linear regression model of the time required to run the "latex" command on a computer of your choice as a function of input file size.

To conduct your experiment, type "time latex" and use the total elapsed time that "time" reports as the response variable. Use the file size in characters as reported by the "ls" command. Use several LaTeX files that you have written earlier this semester, such as homeworks for class.

Include the following items with the solution that you turn in:

  1. State what machine and LaTeX version you used to make your observations.

  2. Summarize the procedure you used to make the measurements. (Give a detailed answer -- we will use this in a later class. For example, give a table with the exact order of runs that you make, including the file size used and response variable observed.)

  3. Visually verify the assumptions for regression using the graphs discussed in section 14.7 of Jain. Include the graphs in the solution you turned in. Comment on the quality of the model.

  4. Compute MSE.

  5. Compute R^2 and comment on the quality of your model.

  6. For what file size is your model most accurate? For what file sizes is your model least accurate?

  7. What do you believe is responsible for the variation of the response variable due to errors?

  8. For each of the common mistakes listed in [Jain, 15.6], state if the mistake is potentially relevant to your solution. For all potentially relevant mistakes, evaluate if your solution suffers to any degree from the mistake.