6. Estimation III: Confidence Intervals for \(p\), \(\sigma^2\) (via \(\chi^2\)), and \(\sigma_1^2/\sigma_2^2\) (via \(F\)) with Large-Sample Conditions
6.0 Notation Table
Symbol |
Meaning |
|---|---|
\(n\) |
Sample size (one sample) |
\(n_1,\ n_2\) |
Sample sizes (two samples) |
\(X\) |
Number of successes in \(n\) Bernoulli trials |
\(\hat{p} = X/n\) |
Sample proportion (point estimate of \(p\)) |
\(q = 1-p,\ \hat{q} = 1-\hat{p}\) |
Complement proportions |
\(z_{\alpha/2}\) |
Standard Normal critical value for a two-sided level \(\alpha\) |
\(e\) |
Target margin of error for a proportion interval |
\(S^2,\ s^2\) |
Sample variance (random variable / observed value) |
\(\sigma^2,\ \sigma\) |
Population variance and standard deviation |
\(\nu = n-1\) |
Degrees of freedom for a one-sample variance problem |
\(\chi^2_{\alpha,\nu}\) |
Chi-square critical value with right-tail area \(\alpha\) |
\(F_{v_1,v_2}\) |
F distribution with degrees of freedom \(v_1,\ v_2\) |
\(v_1 = n_1-1,\ v_2 = n_2-1\) |
Degrees of freedom for a two-sample variance-ratio problem |
\(f_{\alpha}(v_1,v_2)\) |
F critical value with right-tail area \(\alpha\) |
6.1 Introduction
In the previous estimation modules, the main target was a mean, such as \(\mu\) or a difference of means. Those intervals typically had a symmetric “estimate \(\pm\) margin of error” structure because the reference distributions (Normal or t) are symmetric. In many operations and quality settings, however, the key parameter is not a mean.
This module focuses on confidence intervals for (i) a binomial proportion \(p\) and (ii) process variability parameters, including \(\sigma^2\), \(\sigma\), and a ratio \(\sigma_1^2/\sigma_2^2\). These targets connect directly to managerial questions such as “What fraction is defective?” and “Is one process more variable than another?” The interval formulas remain systematic, but the reference distributions and interpretation details are different.
6.2 Learning Outcomes
After this module, you should be able to:
Construct and interpret a large-sample confidence interval for a population proportion \(p\)
Check practical conditions for using a Normal approximation in proportion intervals
Choose a sample size \(n\) to achieve a target margin of error for estimating \(p\)
Construct and interpret a confidence interval for \(\sigma^2\) and for \(\sigma\) under Normal sampling
Construct and interpret a confidence interval for the ratio \(\sigma_1^2/\sigma_2^2\) under independent Normal samples
Explain why variance and variance-ratio intervals are typically not symmetric around the point estimate
6.3 Main Concepts
6.3.1 Confidence interval for a single proportion \(p\)
Consider a process outcome that is classified into two categories, such as pass/fail or defective/nondefective. If each inspected item is independent and has the same success probability \(p\), then \(X\) (the number of successes in \(n\) trials) follows a binomial model. The natural point estimate is the sample proportion \(\hat{p} = X/n\).
For large \(n\), the sampling distribution of \(\hat{p}\) is approximately Normal with mean \(p\) and variance \(p(1-p)/n\). This approximation is used to build a confidence interval. In practice, the approximation is most reliable when the expected counts of successes and failures are not small, which is commonly checked using \(n\hat{p} \ge 5\) and \(n\hat{q} \ge 5\).
A widely used large-sample interval (often called the Wald interval) is:
This interval has the familiar estimate-plus-or-minus form, but it can behave poorly when \(p\) is near 0 or 1 or when \(n\) is not large enough. Because \(p\) must lie in \([0,1]\), it is also important to notice when an approximate interval produces endpoints outside that range, as that is a sign that the approximation is under stress.
A more stable alternative comes from solving the Normal-approximation inequality without leaving \(p\) inside the standard error. One convenient representation is:
This adjusted interval is slightly more algebraic, but it tends to be closer to the advertised confidence level across a wider range of \(p\) values. For large \(n\), the adjusted and the Wald intervals become very similar, so the difference is mainly important for small-to-moderate sample sizes or extreme proportions.
Example 6.1
A factory runs a comprehensive electrical test on finished devices before shipment. In a random sample of \(n=500\) devices, \(15\) fail at least one test, so \(485\) pass. The quality engineer wants a 90% confidence interval for the true pass probability \(p\).
Question: What is a 90% confidence interval for \(p\) and how should it be interpreted?
The point estimate is \(\hat{p}=485/500=0.97\). Using \(z_{0.05}\approx 1.645\), the estimated standard error is \(\sqrt{\hat{p}(1-\hat{p})/n}=\sqrt{0.97\cdot 0.03/500}\approx 0.00763\). The Wald interval is \(0.97 \pm 1.645(0.00763)\), which gives approximately \(0.9575 \le p \le 0.9825\).
Answer: A 90% confidence interval is approximately \([0.9575,\ 0.9825]\). Interpreted operationally, the data support that the long-run pass rate is likely between about 95.8% and 98.3%, under stable process conditions and random sampling.
Figure 6.1 — Empirical coverage for proportion intervals (why conditions matter)
The main message of this figure is that not all “95%” proportion intervals achieve 95% coverage in finite samples, especially when \(p\) is near 0 or 1. The purpose is to connect the formula conditions (large-sample approximation) to an observable performance metric: how often the interval contains the true \(p\). The curves are obtained by simulation, which is appropriate because coverage is defined by repeated sampling under the same true parameter.
In this simulation, the data are not collected from a real factory; they are generated from a binomial model with a fixed true \(p\) and a fixed sample size \(n\). One “repetition” means generating one random sample of size \(n\), computing the confidence interval from that sample, and recording whether the interval contains the true \(p\). The figure repeats this many times for each \(p\) value to estimate the long-run coverage probability.
To read the figure, first choose a sample size \(n\) from the dropdown menu. Then, for each value on the horizontal axis (the true \(p\)), compare the empirical coverage of the two methods to the horizontal reference line at 0.95. The empirical curves are simulation-based estimates, while the 0.95 line is the theoretical target (the nominal confidence level).
The statistical message is that the Wald interval can undercover substantially for small \(n\) and for extreme \(p\), meaning the true \(p\) is inside the interval less often than advertised. As \(n\) increases, both methods improve, but the adjusted interval typically stays closer to 0.95 for a wider range of \(p\). For quality work, this matters because rare-defect settings (\(p\) near 1 for pass, or near 0 for defect) are common, so checking counts like \(n\hat{p}\) and \(n\hat{q}\) is an applied risk-control step rather than a formal detail.
6.3.2 Sample size planning for estimating \(p\)
When planning an inspection study, it is common to choose \(n\) so that the interval half-width does not exceed a target error tolerance \(e\). For large-sample planning, a typical approximation is based on the Wald margin of error:
Solving for \(n\) gives:
This planning equation requires a working value for \(p\). If a pilot study or past data provide a reasonable estimate, substitute \(\hat{p}\) for \(p\). If no estimate is available, a conservative design uses the fact that \(p(1-p)\) is maximized at \(p=0.5\), which yields an upper bound on the required sample size.
Example 6.2
A service operation surveys customers to estimate the fraction who would recommend the service. Management wants a 95% confidence interval with margin of error at most \(e=0.02\), but there is no reliable prior estimate of \(p\). The survey can be designed with any sample size \(n\).
Question: What sample size guarantees the target margin of error under the conservative design?
Using the conservative choice \(p(1-p)\le 0.25\) and \(z_{0.025}\approx 1.96\), the planning rule becomes \(n \approx z_{0.025}^2(0.25)/e^2\). Substituting \(e=0.02\) gives \(n \approx (1.96^2)(0.25)/(0.02^2)=2401\).
Answer: A conservative design uses \(n=2401\). This ensures the planned half-width is at most 0.02 at the 95% level, regardless of the true underlying proportion.
6.3.3 Confidence interval for \(\sigma^2\) and \(\sigma\) (Normal sampling)
Variability is often the operational constraint, not the mean. For example, even if the average fill weight meets the target, excessive variation can produce too many underfilled or overfilled units. In these situations, the parameter of interest is \(\sigma^2\) or \(\sigma\).
If \(X_1,\dots,X_n\) are sampled from a Normal population, then the statistic
This is the key pivot for variance inference. Define \(\chi^2_{\alpha,\nu}\) as the chi-square value such that \(P(\chi^2_{\nu}>\chi^2_{\alpha,\nu})=\alpha\). Because \(\chi^2_{\nu}\ge 0\) and is right-skewed for small \(\nu\), the resulting confidence interval for \(\sigma^2\) is not symmetric around \(s^2\).
A two-sided \(100(1-\alpha)\%\) confidence interval for \(\sigma^2\) is:
A confidence interval for \(\sigma\) is obtained by taking the square root of both endpoints. This transformation is monotone on \((0,\infty)\), so the confidence level is preserved.
Example 6.3
An automated filling machine dispenses liquid detergent into bottles. A random sample of \(n=20\) bottles yields a sample variance of fill volume \(s^2=0.0153\) (in squared fluid ounces). The process is modeled as approximately Normal for fill volume near the target.
Question: Find a 95% confidence interval for \(\sigma^2\) and for \(\sigma\).
Here \(\nu=n-1=19\). Using chi-square critical values for a 95% interval, the variance interval is \((19)(0.0153)/\chi^2_{0.025,19} \le \sigma^2 \le (19)(0.0153)/\chi^2_{0.975,19}\) under the right-tail convention. Numerically, this gives approximately \(0.00885 \le \sigma^2 \le 0.03264\). Taking square roots yields approximately \(0.0941 \le \sigma \le 0.1807\).
Answer: A 95% confidence interval is approximately \([0.00885,\ 0.03264]\) for \(\sigma^2\) and \([0.0941,\ 0.1807]\) for \(\sigma\). Interpreted for process control, the true standard deviation could plausibly be as high as about 0.18, which should be evaluated against specification risk.
Figure 6.2 — Chi-square reference distribution and the variance interval mechanism
The main message of this figure is that variance intervals come from a chi-square reference distribution, which is nonnegative and typically skewed. The purpose is to show exactly where the two chi-square critical points come from and how they map into the two endpoints of the confidence interval for \(\sigma^2\). The curve displayed is theoretical (a chi-square density), and the interval endpoints shown are computed from the variance-interval formula using a fixed illustrative value of \(s^2\) to keep the focus on the role of \(n\).
This figure does not rely on repeated sampling to draw a histogram, but the logic is still repeated-sampling logic. The chi-square distribution appears because, under Normal sampling, the scaled statistic \((n-1)S^2/\sigma^2\) has the same distribution in every repetition, even though \(S^2\) changes from sample to sample. Here \(n\) is the sample size used to compute \(s^2\), and the degrees of freedom are \(\nu=n-1\).
To read the figure, first select \(n\) (equivalently, \(\nu\)) from the dropdown. Then locate the two vertical lines; they mark the lower and upper chi-square critical values that cut off equal tail areas of \(\alpha/2\). The shaded middle region represents the central probability \(1-\alpha\), and it is this central probability statement that is algebraically rearranged into a confidence interval for \(\sigma^2\).
The statistical message is that the endpoints depend strongly on \(\nu\), which means the interval tightens as \(n\) increases. For small \(n\), the chi-square curve is more skewed, and the resulting interval for \(\sigma^2\) can be quite asymmetric, reflecting genuine uncertainty about variability from limited data. As \(n\) grows, the chi-square distribution becomes more symmetric, and the variance interval becomes narrower, which supports more decisive operational conclusions about process consistency.
6.3.4 Confidence interval for the ratio \(\sigma_1^2/\sigma_2^2\) (two Normal samples)
Comparing variability is a common management task. Examples include comparing two machines, two suppliers, or two logistics routes to determine which is more consistent. The appropriate parameter is often a ratio such as \(\sigma_1^2/\sigma_2^2\) or \(\sigma_1/\sigma_2\), because “twice as variable” is a ratio statement.
If two independent random samples are taken from Normal populations, then the ratio of scaled sample variances follows an F distribution:
Define \(f_{\alpha}(v_1,v_2)\) as the value such that \(P(F_{v_1,v_2}>f_{\alpha}(v_1,v_2))=\alpha\). A two-sided \(100(1-\alpha)\%\) confidence interval for \(\sigma_1^2/\sigma_2^2\) can be written as:
Because \(F\ge 0\) and is generally right-skewed, this interval is also not symmetric around \(s_1^2/s_2^2\). A confidence interval for \(\sigma_1/\sigma_2\) is obtained by taking the square root of both endpoints.
Example 6.4
A distribution center compares delivery-time consistency from two routing policies. Independent samples are collected under each policy, and delivery time is treated as approximately Normal within each policy because the process is stable and dominated by many small additive delays. The summary statistics are: policy 1 has \(n_1=15\) and sample standard deviation \(s_1=3.07\), while policy 2 has \(n_2=12\) and sample standard deviation \(s_2=0.80\).
Question: Construct a 98% confidence interval for \(\sigma_1^2/\sigma_2^2\) and interpret it.
The point estimate is \(s_1^2/s_2^2=(3.07^2)/(0.80^2)\approx 14.73\). With \(\alpha=0.02\), we use \(v_1=14\) and \(v_2=11\) and obtain the needed F critical values for the two tails. Substituting into the interval formula yields approximately \(3.43 \le \sigma_1^2/\sigma_2^2 \le 56.90\), and taking square roots gives approximately \(1.85 \le \sigma_1/\sigma_2 \le 7.54\).
Answer: A 98% confidence interval for the variance ratio is approximately \([3.43,\ 56.90]\). Because the interval is entirely above 1, policy 1 appears substantially more variable than policy 2, and the standard deviation under policy 1 could plausibly be between about 1.9 and 7.5 times that of policy 2.
Figure 6.3 — F reference distribution and the variance-ratio interval mechanism
The main message of this figure is that variance-ratio intervals are built from an F reference distribution, and the asymmetry of that distribution directly affects the asymmetry of the confidence interval. The purpose is to show how the two critical values are chosen and how they define the central \(1-\alpha\) probability region used in the interval derivation. The plotted curve is theoretical (an F density), and the displayed critical values are computed from the chosen degrees of freedom.
No real comparison dataset is used in the figure because the learning goal is the mechanism, not a specific case. The degrees of freedom \(v_1=n_1-1\) and \(v_2=n_2-1\) come from the two sample sizes that would be used to compute \(s_1^2\) and \(s_2^2\). Although the plot is not a repeated-sampling histogram, the central shaded region corresponds to the long-run statement that the F pivot falls between these cutoffs with probability \(1-\alpha\).
To read the figure, first choose a pair \((n_1,n_2)\) from the dropdown, which updates \((v_1,v_2)\). Then identify the two vertical lines; they mark the lower and upper F cutoffs that leave equal tail areas of \(\alpha/2\). The central shaded region is the probability mass used to form the inequality that becomes a confidence interval for \(\sigma_1^2/\sigma_2^2\) after algebraic rearrangement.
The statistical message is that small degrees of freedom produce a more skewed F curve, which leads to wider and more asymmetric variance-ratio intervals. As \(n_1\) and \(n_2\) increase, the distribution concentrates and the interval for \(\sigma_1^2/\sigma_2^2\) becomes tighter, enabling clearer operational comparisons. In practice, this figure also supports a key interpretation rule: if the resulting interval for \(\sigma_1^2/\sigma_2^2\) includes 1, then equal variability remains plausible at the stated confidence level.
6.4 Discussion and Common Errors
One common error for proportion intervals is treating the confidence level as a probability that \(p\) lies in the specific computed interval. The correct interpretation is long-run: if the same sampling method were repeated many times, the method would produce intervals that contain \(p\) about \(100(1-\alpha)\%\) of the time. Another frequent issue is using a large-sample Normal approximation when \(n\hat{p}\) or \(n\hat{q}\) is small, which can lead to misleadingly narrow intervals.
A common error for variance intervals is forgetting that the chi-square method requires approximate Normal sampling for the original data values, not merely a large \(n\). The statistic \((n-1)S^2/\sigma^2\) can deviate substantially from chi-square when the population distribution is far from Normal, which makes the reported confidence level unreliable. It is also easy to mix up which chi-square critical value goes in the denominator, so it is helpful to remember that the smaller chi-square value produces the larger variance endpoint because of inversion.
For variance-ratio intervals, the most frequent technical error is swapping the numerator and denominator when forming \(s_1^2/s_2^2\) while keeping \((v_1,v_2)\) fixed. The degrees of freedom must match the variance in the numerator and denominator, otherwise the F reference distribution is incorrect. A practical interpretation error is assuming that “not including 1” proves a large operational difference; the interval indicates statistical evidence, but operational significance should be judged relative to costs, specifications, or risk tolerances.
6.5 Summary
This module extended confidence interval methods from means to proportions and to variability parameters. The key change is that the reference distributions differ: Normal is used for large-sample proportion inference, while chi-square and F distributions are used for variance and variance-ratio inference under Normal sampling.
For a binomial proportion, \(\hat{p}\) is the point estimate and the interval width scales like \(1/\sqrt{n}\)
Sample size planning for \(p\) depends on a target error \(e\) and a working value for \(p(1-p)\)
For \(\sigma^2\), the pivot \((n-1)S^2/\sigma^2\) leads to a chi-square-based interval that is typically asymmetric
For \(\sigma_1^2/\sigma_2^2\), the F distribution governs the interval, and interpretation often focuses on whether 1 is included