Consider a workstation design with just two factors: memory and cache size.
We perform 2**2 = 4 experiments using just the low and high values of each factor.
The response variable, throughput in MIPS, is listed below for each experiment:
Cache | Memory Size | Size (kbytes)| 4 Mbytes 16 Mbytes --------+---------------------------- 1 | 15 45 2 | 25 75
We can create a model of the response variable as a function of the factor levels using regression.
Let y denote the throughput in MIPS.
The performance y can be regressed on two variables representing categorical factors, x_a and x_b (for memory and cache, respectively) using a nonlinear regression model:
y = q_0 + q_a * x_a + q_b * x_b + q_ab * x_a * x_b
where the q's are called the effects.
Substituting the four observations yields:
15 = q_0 - q_a - q_b + q_ab 45 = q_0 + q_a - q_b - q_ab 25 = q_0 - q_a + q_b - q_ab 75 = q_0 + q_a + q_b + q_ab
There is a unique solution for the four effects:
q_0 = 40
q_a = 20
q_b = 10
q_ab = 5
The resulting model is:
y = 40 + 20x_a + 10x_b + 5x_a * x_b
Thus the mean performance is 40 MIPS; the effect of memory is 20 MIPS; the effect of cache is 10 MIPS; and the interaction between memory and cache accounts for 5 MIPS.
Question: Why did we treat the factors as categorical? After all, memory sizes are integers, not categories!