Confidence Intervals


References:

Raj Jain, The Art of Computer Systems Performance Analysis, Wiley, 1991, Chapter 13

Athanasios Papoulis, Probability, Random Variables, and Stochastic Processes, third edition, McGraw-Hill, 1991, Ch. 9.


  1. Sample Versus Population [Jain, 13.1]

  2. Inferring the Mean by a Confidence Interval [Jain, 13.2]

  3. Testing for a Zero Mean [Jain, 13.3]

  4. Comparing Two Alternatives with Paired Observations [Jain, 13.4.1]

  5. Comparing Two Alternatives with Paired Observations [Jain, 13.4.2, 13.4.3]

  6. What Confidence Level to Use [Jain, 13.5]

  7. Why CIs and Not Hypothesis Testing [Jain, 13.6]

  8. Determining Sample Size [Jain, 13.9]

Sample Versus Population

Sample Versus Example

Jain aptly observes that "sample" and "example" originate from the same Old French word. So think of a "sample" that you analyze as an example!

Statistical Inference

Some terms:

parameter
a numerical descriptive measure for a population

statistic
a numerical descriptive measure for a sample drawn from a population

We can (almost) never compute a parameter. All we can do is compute a statistic for several populations.

How do we infer the parameter from the statistics?


Inferring the Mean by a Confidence Interval

The most fundamental parameter to estimate is the population mean (denoted mu). Let xbar denote the sample mean.

The best statement we can make about mu from a set of samples is:

  1. We cannot come up with a single number for mu from our sample, only bounds.

  2. We cannot say with certainty that mu lies within the bounds -- only with a certain probability.

Or:

There exist bounds c_1 and c_2 and a probability alpha such that:

        Probability{ c_1 <= mu <= c_2 } = 1 - alpha

Some terms:

Confidence interval:
The interval ( c_1, c_2 )

Significance level:
The fraction alpha, typically 0.05 or 0.1

Confidence level:
The percentage 100(1 - alpha), typically 95% or 90%

How to compute 90% confidence interval (CI): Method 1

(From p. 243, eq. (9-5) in Papoulis)

  1. Collect n samples.

  2. Sort samples in ascending order.

  3. Compute the 5-percentile and 95-percentile (0.05-quantile and 0.95-quantile), which are the means of the confidence interval:
            5-percentile  = [1+0.05(n-1)]
            95-percentile = [1+0.95(n-1)]
    

Note what happens as the confidence level (e.g., 90%) changes:

Method 1 is used when either


Sidebar: Why is CI Centered?

Why in Method 1 did we not choose an uncentered interval, say 0 and 90-percentile, or the 1, 91-percentile?

We desire the narrowest interval possible, because this will be the most specific answer that we can give for a confidence interval. Thus we want constants c_1 and c_2 that minimize c_2-c_1 but still satisfy


        Probability{ c_1 <= mu <= c_2 } = 1 - alpha

The "centered interval" satisfies this criteria when the distribution of the random variable being estimated is symmetrical (e.g., normally distributed). For asymmetrical distributions, the centered interval is often chosen for simplicity, recognizing that it may not be the smallest interval.


How to compute CI's: Method 2

Method 2 applies when trying to estimate population mean. The formula obtained is the same as Method 1, when a normal distribution is substituted.

The Central Limit Theorem:

Given independent observations {x_1, x_2, ..., x_n} from the same population with mean mu and standard deviation sigma, the sample mean for large n is approximately normally distributed with mean mu and standard deviation sigma/sqrt(n).
xbar ~ N ( mu, sigma/sqrt(n))

So with n sufficiently large, we have a theoretical distribution that fits our "samples" of sample mean! (Recall last section on determining distributions.)

We usually do not know the population standard deviation sigma. But if an unbiased estimator of sigma^2 is the sample variance, s^2 [Papoulis, p. 247]. Thus an approximate 100(1-alpha)% CI for mu is obtained from method 1:

where z_{1-alpha/2} is the (1-alpha/2)-qauntile of a unit normal variate (see Table A.2 in Jain).

Example:

Problem: Compute a CI for mean CPU time based on the 32 values of measured CPU time from the experiment in Jain, Example 12.4.

Solution:

How to compute CI's: Method 3

The Central Limit Theorem (above) says the mean is approximately normally distributed with standard deviation sigma/sqrt(n). But we used the sample standard deviation s, not sigma, in our CI formula! Why?

In general sigma is unknown. (We're estimating population parameters from samples, so we won't know a population parameter like sigma.) If the sample size is about 30 or more, s is an estimate of sigma.

But if the sample size is smaller, we cannot do this!

W. S. Gosset faced this problem when he was a chemist for a winery and had too few samples of to establish a CI for wine quality. He thought that using Method 2 yielded incorrect CI's.

Gosset devised a more flexible distribution that:

Company policy prohibited publishing the result under his real name, so he used the pen name "Student", hence the name "Student's t-distribution."

Thus use the t[1-alpha/2;n-1] rather than z_[1-alpha/2] for the CI formula.

Note that t[.,.] has two parameters, unlike z[.]. The second parameter to t, namely n-1, is the number of degrees of freedom.

See Table A.4 in Jain for tabulate t distribution values.

Interpretation of CI

Consider a 90% CI:

Relevance of CI's to Computer Measurement

Are the assumptions for the CI met for computer measurements? Suppose we want to compute a CI for the mean execution time required for a computer program? For the mean interarrival time of packets to a server in a network?


Testing for a Zero Mean

Suppose you measure a quantity that can be positive or negative. Positive might mean good, and negative bad. Based on several samples, is the measured effect with probability (1-alpha) good, bad, or inconclusive?

Simply compute a CI and see if the interval excludes or zero!

Example:

You must evaluate whether a new optimization method for a compiler is worth releasing to customers. The optimization might reduce the execution time of some programs, but it might increase the execution time of others.

Should you add the optimization (with 99% confidence that customers will see a reduction in execution time)?

If the population mean mu (the mean execution time change for all programs that will ever be compiled) is greater than zero, then you should include the optimization.

To estimate mu, you try the optimization on 7 programs. The change in execution times observed (as multiplicative factors) are:

1.5, 2.6, -1.8, 1.3, -0.5, 1.7, 2.4

Now construct the 99% CI:

  1. Visually check for t distribution

  2. Compute n=7, xbar=1.03, s=1.60,

  3. Find t[0.995;6]=3.707 from Table A.4

  4. Compute CI:

The CI includes zero, so we cannot say with 99% confidence that the optimization either reduces or increases the mean execution time of all programs that will ever be compiled!

Incidentally, we can say, with 99% confidence, that a program will not have its run time improved by more than 121%, and will not have its run time increased by more than 327%. Therefore, with 99% confidence, the optimization does more harm than good.


Comparing Two Alternatives with Paired Observations

Often we wish to compare the mean value of a parameter for two alternatives to see which is better. For example, which machine is faster on a given workload?

Simply compute the CI for the difference of the sample means. If the interval includes zero, then the machines are not significantly different in speed! See Jain Fig. 13.3.

Example:

Can we say with 90% confidence that either LRU-MIN or LRU-THOLD yields higher hit rate for proxy servers, based on Table 4 of the Abrams, Standridge, Abdulla, Williams, and Fox paper?

Table 4 contains 6 samples means from six experiments of each policy. Thus we must compare paired observations.

Solution:

  1. For each pair, subtract the corresponding values (e.g., 31.5-31.0=0.5). This yields six differences.

  2. Compute 90% CI's using t distribution and 5 differences.

  3. If the CI includes zero, neither policy yields different hit rates with 99% confidence. Otherwise we can identify which policy yields lower hit rate.

You'll have a chance to work this out in your next homework!


Comparing Two Alternatives with Unpaired Observations

Suppose a paper comes out with a policy called "BTLM" (Better Than LRU-MIN). You want to determine, with 90% confidence, if BTLM is really better than LRU-MIN.

Probably the new paper uses different workloads than the Abrams, et al paper. So there will be no pairing of observations. In fact, the BTLM paper might use 20 observations, so the sample sizes are different.

To compare them, compute a CI for each paper separately. Use the approximate visual test in Jain Fig. 13.4:


What Confidence Level to Use

The higher the confidence level, the wider the interval:

For 32 samples discussed earlier:

95% CI = (3.57, 4.23)

99% CI = (3.46, 4.33)

So as CI approaches 100%, interval width approaches infinity.

Too high a confidence level might yield uselessly wide bounds. Also the confidence level used depends on how critical the answer is.


Why CIs and Not Hypothesis Testing

The traditional statistics approach to statistical inference is to test a hypothesis. The null hypothesis might be

xbar = mu

The approach requires defining a test statistic and a rejection region. The test result is "ACCEPT" or "REJECT." The advantage is that a decision-maker gets a clear YES/NO answer.

Jain argues that using a CI (and testing if the CI includes a certain value, such as zero) is:

Compare:

Confidence interval:
The population mean hit rate is in (45.1,45.9) with probability 0.95

Statistical test:
The hit rate is 45.6 with probability less than 5%.

Determining Sample Size

"How many times should I repeat my experiment?" This is a common question during experiment design. First you must choose

Procedure:

  1. Run a trial experiment. Compute sample mean and standard deviation.

  2. Recall that the CI is

    Therefore we can solve for n as a function of interval width and confidence level:


Please send inquiries and comments to abrams@vt.edu.