2. Sampling Distributions II: Central Limit Theorem for the Sample Mean and Normal Approximation
2.0 Notation Table
Notation |
Meaning |
|---|---|
\(X\) |
One observation (measurement) |
\(X_1,\ldots,X_n\) |
Random sample (independent, same distribution) |
\(n\) |
Sample size |
\(\bar{X}\) |
Sample mean |
\(\mu\) |
Population mean |
\(\sigma\) |
Population standard deviation |
\(\sigma_{\bar{X}}=\sigma/\sqrt{n}\) |
Standard error of \(\bar{X}\) |
\(Z\) |
Standardized sample mean |
\(\mathcal{N}(a,b)\) |
Normal with mean \(a\) and variance \(b\) |
2.1 Introduction
In Module 1, we treated statistics as random variables and emphasized that repeated sampling produces a distribution of possible statistic values. We also used the standard error to describe how much a statistic such as \(\bar{X}\) typically varies from sample to sample.
This module adds a new ingredient: the shape of the sampling distribution of \(\bar{X}\). Shape matters because probability calculations require a distributional model, especially when decisions depend on thresholds or service-level targets.
To anchor the discussion in operations, consider counter transaction completion time \(X\) (minutes) in a convenience-store setting. Such time measurements often satisfy \(X\ge 0\) and exhibit right-skewness, because many transactions are short but some require additional services and take much longer.
2.2 Learning Outcomes
After this session, we should be able to:
Define the sampling distribution of the sample mean \(\bar{X}\)
State the Central Limit Theorem (CLT) for \(\bar{X}\) and interpret “approximately Normal”
Distinguish an exact Normal sampling distribution from a CLT-based approximation
Use \(\sigma_{\bar{X}}=\sigma/\sqrt{n}\) to quantify typical variation in \(\bar{X}\)
Standardize \(\bar{X}\) using \(Z\) and approximate probabilities about averages
Explain practical conditions that affect CLT accuracy (skewness, heavy tails, dependence)
2.3 Main Concepts
2.3.1 Sampling Distribution of the Sample Mean
Let \(X_1,\ldots,X_n\) be an independent random sample from a population with mean \(\mu\) and standard deviation \(\sigma\) (finite). The sample mean is
The sampling distribution of \(\bar{X}\) is the distribution of \(\bar{X}\) across repeated samples of the same size \(n\). Two foundational results describe its center and spread:
Therefore, the standard deviation of \(\bar{X}\) is
The quantity \(\sigma_{\bar{X}}\) is called the standard error of the sample mean. It describes the typical sampling-to-sampling discrepancy between \(\bar{X}\) and \(\mu\) under the stated sampling assumptions, and it shrinks as \(n\) increases.
2.3.2 Exact Normal Baseline (When the Population Is Normal)
If the population is Normal, then the sampling distribution of \(\bar{X}\) is exactly Normal for any sample size. This is not an approximation and does not require “large \(n\)”:
This baseline is useful because it isolates the role of sample size in the spread of \(\bar{X}\) without introducing approximation error. In many operational settings, however, the distribution of \(X\) is not Normal, which motivates the CLT.
2.3.3 Central Limit Theorem for the Sample Mean
The Central Limit Theorem states that, under broad conditions, the sampling distribution of \(\bar{X}\) becomes close to a Normal distribution as \(n\) grows. A practical statement is as follows: if \(X_1,\ldots,X_n\) are independent with common mean \(\mu\) and finite variance \(\sigma^2\), then \(\bar{X}\) is approximately Normal for large \(n\).
Equivalently, the CLT is often expressed through the standardized sample mean \(Z\). Define
When the CLT approximation is accurate, \(Z\) is approximately \(\mathcal{N}(0,1)\). The word “approximately” is essential: the approximation quality depends on \(n\) and on features of the population distribution (notably skewness and tail heaviness).
Figure 2.1 visualizes how the sampling distribution of \(\bar{X}\) changes with \(n\) when the population is right-skewed, as is common for service times. The figure is designed to show that the distribution of averages can look close to Normal even when individual observations do not.
In Figure 2.1, the data are simulated from a right-skewed time model rather than collected as a single real dataset, because the sampling distribution is defined through repetition. A repetition means drawing a fresh independent sample of size \(n\) from the same fixed population model and computing one value of \(\bar{X}\). In this figure, \(n\) is the number of transactions included in each average.
To read the figure, first select a sample size \(n\) from the dropdown and focus on the blue density histogram, which represents many repeated values of \(\bar{X}\). Then compare the histogram to the smooth gray reference curve, which is the Normal model with mean \(\mu\) and standard deviation \(\sigma/\sqrt{n}\). The histogram is empirical (simulation output), while the curve is theoretical (Normal approximation).
The main message is that increasing \(n\) makes the histogram more symmetric and more closely aligned with the Normal reference curve, even though the underlying population is right-skewed. At the same time, the distribution becomes narrower because \(\sigma_{\bar{X}}=\sigma/\sqrt{n}\) decreases as \(n\) increases. This matters operationally because decisions based on average performance become more stable with larger samples.
The practical implication is that Normal probability calculations about averages can be reasonable without assuming that individual observations are Normal. However, the approximation is typically weakest in the tails, so threshold decisions with small \(n\) deserve extra caution. This motivates the next idea: standardizing \(\bar{X}\) to evaluate probabilities on a common scale.
2.3.4 Standardization and the Z Scale
Standardization converts \(\bar{X}\) into standard-error units so that different sample sizes can be compared on a common scale. The standardized sample mean is
The numerator \(\bar{X}-\mu\) measures how far the observed average is from the population mean. The denominator \(\sigma/\sqrt{n}\) rescales that deviation by the typical sampling variation of \(\bar{X}\) under independent sampling.
Figure 2.2 shows the distribution of \(Z\) under the same right-skewed population model used in Figure 2.1. The purpose is to separate “shape convergence” (toward \(\mathcal{N}(0,1)\)) from the shrinking spread of \(\bar{X}\) as \(n\) increases.
In Figure 2.2, the data are simulated because the distribution of \(Z\) is defined by repeated sampling and repeated standardization. A repetition means drawing a new sample of size \(n\), computing \(\bar{X}\), and then computing \(Z\) from that sample. In this figure, \(n\) is again the number of observations per repetition.
To read the figure, first choose \(n\) and examine the blue density histogram of simulated \(Z\) values, which is the empirical sampling distribution on the standardized scale. Then compare it to the smooth gray curve, the standard Normal density \(\mathcal{N}(0,1)\). Because the x-axis is fixed, differences across \(n\) are interpreted as genuine changes in shape rather than rescaling artifacts.
The key message is that the standardized distribution becomes closer to the standard Normal as \(n\) increases, which is the operational form of the CLT used for probability calculations. For small \(n\), skewness and tail differences can remain visible even after standardization, indicating that “approximately Normal” may be a rough approximation. For larger \(n\), the alignment improves, supporting the practical use of standard Normal tail areas.
The practical implication is that many probability questions about \(\bar{X}\) can be converted into questions about \(Z\) when independence and finite variance are credible. When the approximation is poor, simulation-based calibration (as in the figure design) is a defensible diagnostic. This perspective also clarifies why tail-focused service guarantees may require larger sample sizes than center-focused summaries.
2.3.5 When Is a Normal Approximation Reasonable?
The CLT is not a single numeric rule but a convergence principle, so “large enough \(n\)” depends on the underlying distribution and the purpose of the calculation. If the population distribution is symmetric and unimodal, a Normal approximation for \(\bar{X}\) can be reasonable even for small \(n\). If the population is strongly skewed or heavy-tailed, larger samples are typically required, especially for tail probabilities.
Operational sampling plans also affect validity. The CLT framework assumes independence, but observations collected consecutively in time can be positively correlated due to persistent system states (same cashier, sustained queue, similar service mix). Positive dependence inflates the true variability of \(\bar{X}\) relative to \(\sigma/\sqrt{n}\), so a Normal calculation can be overconfident even when \(n\) is numerically large.
Figure 2.3 emphasizes tail-probability accuracy rather than overall shape, because many management decisions are threshold-based. The figure compares a right-skewed population to a Normal baseline that matches the same \(\mu\) and \(\sigma\), and it tracks how approximation error changes with \(n\).
In Figure 2.3, the probabilities are estimated by simulation because tail accuracy is inherently about repeated trials under known truth. A repetition means drawing a fresh sample of size \(n\), computing \(\bar{X}\), and checking whether a threshold event occurs. In this figure, \(n\) is the number of observations used to compute each sample mean, and the event is evaluated across many repetitions for each \(n\).
To read the figure, start with the x-axis (sample size) and the y-axis (absolute approximation error). Each line compares the simulated tail probability of a standardized event to its standard Normal target, so the plotted values summarize discrepancy rather than raw probability. The plotted points are empirical (simulation), while the reference probability used to compute error is theoretical (standard Normal tail area).
The main message is that approximation error decreases as \(n\) increases, but the rate depends on the population shape. The Normal baseline line remains near zero across \(n\) because the Normal model for \(\bar{X}\) is exact when the population is Normal, up to simulation noise. The right-skewed line is larger for small \(n\) and typically shrinks with \(n\), reflecting that tails converge more slowly than central shape.
The practical implication is that “approximately Normal” is context-dependent: it may be adequate for rough central probabilities at moderate \(n\), but inadequate for strict tail guarantees unless \(n\) is larger. In operations, this affects how confidently one can interpret an extreme average as a signal of process change rather than sampling variation. The figure also motivates documenting the sampling mechanism, because dependence can mimic tail error even when \(n\) is large.
2.3.6 Example 2.1
A convenience store tracks transaction completion time \(X\) in minutes during a stable time window. Historical monitoring suggests a long-run mean \(\mu=1.24\) and standard deviation \(\sigma=0.99\) for individual transactions, with \(X\ge 0\). A supervisor samples \(n=36\) transactions and computes the average completion time \(\bar{X}\).
Question: What is the approximate probability that the sample average exceeds 1.50 minutes?
We want \(P(\bar{X}>1.50)\). Under independent sampling with finite variance, the CLT implies \(\bar{X}\approx \mathcal{N}(\mu,\sigma^2/n)\), so the standard error is \(\sigma_{\bar{X}}=\sigma/\sqrt{n}\). Here,
Standardizing the threshold yields
Therefore,
The approximate probability is about 0.06, meaning an average above 1.50 minutes is uncommon but plausible under stable conditions. If such exceedances occur frequently across comparable periods, the pattern is more consistent with a process shift than with ordinary sampling variation.
2.3.7 Example 2.2
A manufacturing line produces light bulbs, and the lifetime \(X\) (hours) is approximately Normal under stable production conditions. The process target is a mean lifetime around 800 hours, and historical data support a standard deviation near 40 hours for individual bulbs. A quality engineer tests \(n=16\) bulbs and records the sample mean lifetime \(\bar{X}\).
Question: Assuming \(\mu=800\) and \(\sigma=40\), what is the probability that \(\bar{X}\) is less than 775 hours?
Because the population is approximately Normal, the sampling distribution is exactly Normal for any \(n\), not merely approximately Normal. Thus \(\bar{X}\sim \mathcal{N}(\mu,\sigma^2/n)\) and the standard error is
Standardizing 775 gives
Hence,
The probability is about 0.006, so an average below 775 hours is a rare event under the stated stable-process model. Observing such a result provides evidence that the process mean may be below 800, although formal decision rules for that claim belong to later inference modules.
2.3.8 Example 2.3
A manager estimates mean transaction time during a rush period by sampling \(n=30\) consecutive customers from 12:00 to 12:10. The method is operationally convenient because it avoids randomization and paperwork, and it produces a quick summary for reporting. However, the rush-period environment often persists for several minutes, which can make consecutive observations similar.
Question: Why can consecutive sampling during a rush increase uncertainty in \(\bar{X}\), even if \(n\) appears moderately large?
The CLT and the standard error formula \(\sigma/\sqrt{n}\) assume independence across \(X_1,\ldots,X_n\). During a rush, the same cashier, queue length, and service mix can create positive correlation among consecutive times, so the data contain less independent information than \(n\) suggests. As a result, the true variability of \(\bar{X}\) can exceed \(\sigma/\sqrt{n}\), and Normal calculations based on independence can understate the chance of extreme averages.
The concise conclusion is that “\(n=30\)” is not sufficient by itself to justify a Normal approximation if dependence is present. When sampling is consecutive in time, validating independence (or redesigning sampling) is part of making CLT-based probability statements credible.
2.4 Discussion and Common Errors
1) Confusing variability of \(X\) with variability of \(\bar{X}\). Individual observations can be highly variable, especially when \(X\ge 0\) and the distribution is right-skewed. The average smooths variability, and its typical spread is controlled by \(\sigma_{\bar{X}}=\sigma/\sqrt{n}\) rather than by \(\sigma\).
2) Treating the CLT as an exact Normality statement. The CLT provides an approximation for large \(n\), not an identity for every \(n\). When the population is strongly skewed or heavy-tailed, the approximation can be noticeably inaccurate for tail probabilities at small or moderate \(n\).
3) Ignoring the sampling mechanism. Independence is a structural assumption about how the data are collected, not a property guaranteed by having many observations. If consecutive observations are correlated, then \(\sigma/\sqrt{n}\) understates uncertainty and the “approximately Normal” model can be misleading in practice.
4) Over-trusting tail calculations. The tails of the distribution are typically the last part to be approximated well as \(n\) increases. When a decision depends on rare-event probabilities, larger samples or simulation-based checks are often justified.
2.5 Summary
The sampling distribution of \(\bar{X}\) describes how sample averages vary across repeated samples of size \(n\)
\(E(\bar{X})=\mu\) and \(\mathrm{Var}(\bar{X})=\sigma^2/n\), so the standard error is \(\sigma_{\bar{X}}=\sigma/\sqrt{n}\)
If the population is Normal, then \(\bar{X}\) is exactly Normal for any \(n\)
The CLT implies \(\bar{X}\) is approximately Normal for large \(n\) under independence and finite variance
Standardization via \(Z=(\bar{X}-\mu)/(\sigma/\sqrt{n})\) supports Normal probability calculations on a common scale
Skewness, heavy tails, dependence, and tail-focused decisions can reduce approximation accuracy and require caution