60 minutes
Open Book, Open Notes
Be careful not to devote too much time to any single problem. Use the point values listed to guide your allocation of time. Good luck -- I want you to do well!
Problem Score
1 /15
2 /25
3 /24
4 /36
Total /100
1. Suppose we perform an experiment and find that the 90% confidence interval for the a parameter is (10.1, 13.4). Suppose you are testifying at a court trial as an expert in statistics. Explain in layman terms (e.g., in a non-technical way) what this statement means: "According to our experiment analysis, the 90% confidence interval for parameter P is (10.1, 13.4)."
2. In the experiment designs of Chapters 20-22, could we determine if the effect of a particular level of a factor on the response variable is significant without using the F-test?
_____ YES ____ NO
If your answer is yes, explain exactly how you could do this. If your answer is no, explain why confidence intervals alone are insufficient. (Hint: Jain uses confidence intervals in Chapters 18-22, but only uses the F-test in Chapters 20-22.)
3. Shown below are the response variable values observed in four different experiments. For each, state whether there is interaction between factors A and B, and justify your answer. (Hint: What would the interaction diagram look like in each case?)
(a)
Levels of Levels of Factor A
Factor B a1 a2 a3 a4
b1 4 4 4 4
b2 4 4 4 4
_____ There is interaction between A and B.
_____ There is no interaction between A and B.
Justification:
(b)
Levels of Levels of Factor A
Factor B a1 a2 a3 a4
b1 4 4 4 4
b2 5 5 5 5
_____ There is interaction between A and B.
_____ There is no interaction between A and B.
Justification:
(c)
Levels of Levels of Factor A
Factor B a1 a2 a3 a4
b1 2 4 8 16
b2 2 4 8 16
_____ There is interaction between A and B.
_____ There is no interaction between A and B.
Justification:
(d)
Levels of Levels of Factor A
Factor B a1 a2 a3 a4
b1 1 2 3 4
b2 5 6 7 8
_____ There is interaction between A and B.
_____ There is no interaction between A and B.
Justification:
4. [36 points] Consider the following data collected from an experiment:
Table 1. Average annual income of U.S. basketball players, in thousands of dollars
Factor A: Height
<2 meters >= 2 meters
Factor B: <30 120 260
Age >= 30 100 140
(a) [9 points] On a separate sheet of paper, derive a model equation using a 22 experiment design (Jain Ch. 17) using an additive model. Do not analyze the model. Staple the separate sheet of paper to the exam when you turn in the exam.
Model equation (using symbols only): __________________________________
List values of parameters in the equation:
(b) [9 points] Repeat (a), but using the two-factor full factorial design (Jain Ch. 21).
Model equation (using symbols only): __________________________________
List values of parameters in the equation:
(c) [9 points] Compare the models you obtained in (a) and (b): Are they the same? If not, why are they different, and is one model clearly better in all regards?
Answers:
1. If we repeat the experiment a large number of times, and computes a 90% confidence interval for each experiment by the same technique, then the proportion of these intervals that contain the true mean should be 90%. See Figure 13.1 in Jain. You may optionally state that the parameter P is "big enough to worry about" because (10.1, 13.4) excludes zero.
Grading:
-3: Student answered, "The mean lies in the interval with probability 0.9." (The statement is misleading, because the interpretation of the probability 0.9 is not stated. If the student meant that "probability 0.9" means "perform a large number of experiments and compute a 90% confidence interval using the same technique and 90% of these intervals will contain the mean" then the statement is correct. The point deduction arises because of the ambiguity.
-1: Student answered "We are 90% sure [or confident] that the true mean lies within the interval." The statement is correct, but what does "sure" [or "confident"] mean?
2. First note that the problem asks about significance of a level of a factor. The answer is yes:
One can calculate a confidence interval for the level of a factor of interest to determine if that level is significant. See Table 21.7 for an example.
Grading:
-3: Some people misread the problem by omitting the words "of a particular level" and came to the conclusion "yes." The point deduction arises because the wrong question was answered.
Some of these people explained that one would need to compute the confidence interval of each level alone and then ask if all levels are significant to get the same result as the F-test. This is not quite correct; see explanation under "-7".
-7: Some people misread the problem by omitting the words "of a particular level" and came to the conclusion that a confidence interval would be insufficient. This answer is correct for the misread question for the following reason (which I believe no one realized):
The F-test allows testing of the hypothesis that a set of two or more sample populations has the same mean. One could use the t-test on all pairwise combinations of samples, but this process would (1) require more computation that then t-test and (2) the probability of falsely rejecting at least one of the hypotheses increases as the nubmer of t tests increases. This is because the samples are being used for more than one test. For example, testing 10 hypotheses, each at 95% confidence, produces an overall confidence of only 60%!. See pages 404-405 in Ott for a discussion of t tests versus F tests.
-12: Some people reasoned that the F test was required to judge whether the experimental error was too large for a factor level to be significant. This is incorrect, because the fact that the confidence interval excludes zero means that we have sufficient observations to conclude with a certain confidence level that the effect of a certain factor level is always non-zero.
-15: No real understanding of the use of a confidence interval versus an F test.
3. The is no interaction in all cases. For each case we give two justifications: the first is based on the table cell values alone, and the second on an interaction graph.
(a)
Using table alone:
Varying A does not change the observed response of 4. Hence there can be no interaction between A and any other factor. (You could also state this using B rather than A.)
Using interaction graph:
The interaction graph with the x-axis representing A contains one horizontal line representing both levels of B. Because the lines are parallel, there is no interaction. (You could also state this by interchanging A and B.)
(b)
Using table alone:
Varying A and holding B constant produces no change in the response variable. Hence there is no interaction of A with any other factor.
Using interaction graph:
The interaction graph with the x-axis representing A contains two horizontal lines, each corresponding on one level of B. Because the lines are parallel, there is no interaction.
(c)
Using table alone:
Varying B and holding A constant produces no change in the response variable. Hence there is no interaction of B with any other factor.
Using interaction graph:
The interaction graph with the x-axis representing A contains one curve representing both levels of factor B. Because the lines are parallel, there is no interaction.
(d)
Using table alone:
Increasing factor A from ai to ai+1 (1<=i<4) while holding B constant always increases the response variable by 1 regardless of the level of B. Similarly, increasing B from b1 to b2 while holding A constant increases the response variable by 4 regardless of the level of A. Hence there is no interaction between A and B.
Using interaction graph:
The lines are parallel in the following graph:

3.
(a)
y = q0 + qaxa + qbxb + qabxab, where:
q0 = 155
qa = 45
qb = -35
qab = -25
Work:
I A B AB yij
1 -1 -1 1 120
1 1 -1 -1 260
1 -1 1 -1 100
1 1 1 1 140
620 180 -140 -100 Total
155 45 -35 -25 Total/4
(b)
y = u + [[alpha]]j+ [[beta]]i + eij, where:
u = 155
[[alpha]]1 = -45, [[alpha]]2 = 45
[[beta]]1 = 35, [[beta]]2 = -35
e11 = -25, e12 = 25
e21 = 25, e22 = -25
Work:
Row Row Row
< 2 m >=2 m sum mean effect
<30 120 260 380 190 35
>=30 100 140 240 120 -35
Col sum 220 400 620
Col mean 110 200 155
Col effect -45 45
(c) The two models are identical (except that obtaining the first model requires fewer arithmetic operations). To see this, consider the four equations for y obtained from the first model:
A B y <30 <2m 155 + 45(-1) + (-35)(-1) + (-25)(-1)(-1) >=30 <2m 155 + 45(1) + (-35)(-1) + (-25)(1)(-1) <30 >=2m 155 + 45(-1) + (-35)(1) + (-25)(-1)(1) >=30 >=2m 155 + 45(1) + (-35)(1) + (-25)(1)(1)
Using the second model:
A B y <30 <2m 155 + -45 + 35 + -25 >=30 <2m 155 + 45 + 35 + 25 <30 >=2m 155 + -45 + -35 + 25 >=30 >=2m 155 + 45 + -35 + -25
The entries for y in the corresponding row of each table are identical!
Grading:
In (b):
-3 if eij's are not computed.
In (c):
-5 if they answered "no" but gave logical differences (interaction in (a) but not in (b); more parameters in (b), ...)
-7 if they answered "no" and gave a false statement (e.g., the full factorial model has more information)