Discussion of 2^2 designs...


Example

Consider a workstation design with just two factors: memory and cache size.

We perform 2**2 = 4 experiments using just the low and high values of each factor.

The response variable, throughput in MIPS, is listed below for each experiment:

Cache	|	      Memory
Size    |               Size
(kbytes)|       4 Mbytes     16 Mbytes
--------+----------------------------
1       |       15           45
2       |       25           75

We can create a model of the response variable as a function of the factor levels using regression.


Details of Creating a Regression Model

Let y denote the throughput in MIPS.

The performance y can be regressed on two variables representing categorical factors, x_a and x_b (for memory and cache, respectively) using a nonlinear regression model:

     y  =  q_0  +  q_a * x_a  +  q_b * x_b + q_ab * x_a * x_b 

where the q's are called the effects.

Substituting the four observations yields:

15 = q_0 - q_a - q_b + q_ab
45 = q_0 + q_a - q_b - q_ab
25 = q_0 - q_a + q_b - q_ab
75 = q_0 + q_a + q_b + q_ab

There is a unique solution for the four effects:

    q_0  = 40
    q_a  = 20
    q_b  = 10
    q_ab = 5

The resulting model is:

     y = 40 + 20x_a + 10x_b + 5x_a * x_b

Thus the mean performance is 40 MIPS; the effect of memory is 20 MIPS; the effect of cache is 10 MIPS; and the interaction between memory and cache accounts for 5 MIPS.

Question: Why did we treat the factors as categorical? After all, memory sizes are integers, not categories!