Reviewed by CalculatorApp.me Math Team
Population & sample standard deviation, variance, z-scores, and the empirical rule.
σ
Population SD
s
Sample SD
68-95-99.7
Empirical Rule
z-score
Standardization
Free online standard deviation calculator — compute population and sample std dev, variance, CV, and more with step-by-step deviation tables and AI insights.
Enter values above to see results.
Explore our in-depth guides related to this calculator
Master percentage calculations with this comprehensive guide covering percentage formulas, increase/decrease, percentage of a number, reverse percentages, and common real-world applications.
Complete geometry reference covering area formulas, volume calculations, triangles, circles, and 3D shapes. Free area calculator, volume calculator, and right triangle solver included.
Complete GPA guide covering weighted vs. unweighted GPA, cumulative GPA calculation, grade conversion charts, and strategies to raise your GPA. Free GPA and grade calculators included.
Standard deviation measures the amount of variation or dispersion in a dataset. A low standard deviation indicates data points cluster near the mean, while a high standard deviation indicates they spread over a wider range. It is the most widely used measure of statistical spread, reported in the same units as the original data.
There are two versions: population standard deviation (σ) divides by N when the data represents the entire population, and sample standard deviation (s) divides by N−1 (Bessel's correction) to provide an unbiased estimate from sample data. In practice, sample SD is far more common since we rarely have access to entire populations.
Standard deviation is the square root of variance. While variance has desirable mathematical properties (additive for independent variables), its units are squared — making it less intuitive. Standard deviation returns to original units: a dataset of test scores in points has a SD in points, not points².
σ = √[ Σᵢ(xᵢ − μ)² / N ]
Step-by-step:
1. Calculate mean: μ = Σxᵢ / N
2. Subtract mean from each value
3. Square each difference
4. Sum the squares
5. Divide by N
6. Take square root
Example: Data = {2, 4, 4, 4, 5, 5, 7, 9}
μ = (2+4+4+4+5+5+7+9)/8 = 40/8 = 5
Differences: -3,-1,-1,-1,0,0,2,4
Squares: 9,1,1,1,0,0,4,16 → sum=32
σ² = 32/8 = 4
σ = √4 = 2Use σ only when you have data for the ENTIRE population — all test scores, all factory outputs, etc. This is rare in practice.
s = √[ Σᵢ(xᵢ − x̄)² / (n − 1) ] Bessel's Correction: divide by (n−1) • Compensates for using x̄ instead of μ • Makes s² an unbiased estimator of σ² • Note: s itself is slightly biased Example: Same data, but as sample x̄ = 5 (same) Sum of squares = 32 (same) s² = 32/(8−1) = 32/7 = 4.571 s = √4.571 = 2.138 As n → ∞, s → σ n=10: n−1=9 (10% correction) n=100: n−1=99 (1% correction) n=1000: negligible difference
Almost always use sample SD in practice. Dividing by n−1 corrects the bias from estimating the mean from the same sample.
Variance:
σ² = Σ(xᵢ − μ)² / N (population)
s² = Σ(xᵢ − x̄)² / (n−1) (sample)
Variance properties:
Var(X+Y) = Var(X) + Var(Y)
(if X,Y independent)
Var(cX) = c² × Var(X)
Var(X+c) = Var(X)
Coefficient of Variation (CV):
CV = (s / x̄) × 100%
Example:
Test A: mean=80, SD=10 → CV=12.5%
Test B: mean=50, SD=10 → CV=20%
Same SD, but B is more variable
relative to its mean.| Range | % of Data | Probability Outside | Description | Example (μ=100, σ=15) |
|---|---|---|---|---|
| μ ± 1σ | 68.27% | 31.73% | Majority of data | 85–115 |
| μ ± 2σ | 95.45% | 4.55% | Nearly all data | 70–130 |
| μ ± 3σ | 99.73% | 0.27% | Virtually all data | 55–145 |
| μ ± 4σ | 99.9937% | 0.0063% | Extreme outliers | 40–160 |
| μ ± 5σ | 99.99994% | 0.00006% | One in 1.7 million |
| Context | Mean | SD | CV | Interpretation |
|---|---|---|---|---|
| IQ scores | 100 | 15 | 15% | 68% of people score 85–115 |
| SAT scores | 1060 | 195 | 18.4% | Score of 1255 = +1 SD = 84th %ile |
| Adult male height (US) | 5'9" | 2.8" | 4.0% | Low CV → heights cluster tightly |
| S&P 500 annual return | ~10% | ~16% | 160% | High CV → volatile investment |
| Manufacturing (Six Sigma) | target | process σ | <0.001% |
Abraham de Moivre derived the normal (bell) curve as an approximation to the binomial distribution. He showed that deviations from the mean follow a predictable pattern — the precursor to standard deviation.
Carl Friedrich Gauss formalized the method of least squares and the normal distribution of errors — demonstrating that measurement errors cluster around the mean with a characteristic spread. The Gaussian distribution bears his name.
Karl Pearson first used the term 'standard deviation' in a lecture, replacing the older 'mean error' and 'probable error.' He chose 'σ' as the symbol. This standardized the terminology still used worldwide.
William Sealy Gosset (pen name 'Student') developed the t-distribution for small samples, showing that sample SD with Bessel's correction follows a different distribution than population SD. His work at Guinness Brewery birthed modern small-sample statistics.
Pearson (1893) — Philosophical Transactions
Karl Pearson introduced the term and symbol σ, establishing standard deviation as the primary measure of statistical dispersion. His framework replaced the less precise 'probable error' and remains the foundation of descriptive statistics.
Gosset/'Student' (1908) — Biometrika
Published under the pseudonym 'Student' (Guinness wouldn't allow publication), this paper showed that sample statistics follow the t-distribution, not the normal distribution, for small samples. This was revolutionary for experimental science.
Taleb — The Black Swan (2007)
Nassim Taleb argued that standard deviation is misleading for fat-tailed distributions (finance, natural disasters). In these domains, extreme events occur far more frequently than the normal distribution predicts, making SD a potentially dangerous metric.
Motorola/GE — Six Sigma Program
Standard deviation and variance are interchangeable.
Variance (σ²) is in squared units — if data is in meters, variance is in m². Standard deviation (σ) is in the original units. They contain the same information but SD is interpretable. Variance is preferred in proofs and formulas because it's additive for independent variables.
A large standard deviation means the data is 'bad' or unreliable.
SD describes natural variation, not data quality. Human height has low SD (people are similar heights); income has high SD (enormous range). Neither is 'bad.' High SD simply means more spread. Context determines whether variation is problematic.
The empirical rule (68-95-99.7) applies to all data.
The 68-95-99.7 rule applies ONLY to normally distributed data. For skewed, bimodal, or heavy-tailed distributions, these percentages can be very different. Chebyshev's inequality provides weaker but universal bounds: at least 75% within ±2 SD for ANY distribution.
Precision math tools for students, teachers, and professionals — CalculatorApp.me.
Browse Math Calculators →Last updated:
CV allows comparing variability between datasets with different units or scales. A stock with CV=30% is more volatile than one with CV=10% regardless of price.
z = (x − μ) / σ (population) z = (x − x̄) / s (sample) Interpretation: z = 0: at the mean z = 1: 1 SD above mean z = −2: 2 SD below mean Example: SAT score = 1350 Mean (μ) = 1060, SD (σ) = 195 z = (1350 − 1060) / 195 = 1.49 → 93.2 percentile Standardization: Converts any normal distribution to N(0,1) — standard normal Enables comparison across scales Outlier detection: |z| > 2: unusual (5% chance) |z| > 3: rare (0.3% chance)
Z-scores transform raw data into a 'how many SDs from the mean?' scale, enabling comparison between SAT, GPA, height, weight — any normally distributed variable.
| 25–175 |
| μ ± 6σ | 99.9999998% | 0.0000002% | Six Sigma quality | 10–190 |
| 3.4 defects per million |
| Body temperature | 98.6°F | 0.7°F | 0.7% | Very tight distribution |
R.A. Fisher rigorously proved why we divide by n−1 (Bessel's correction) using the concept of degrees of freedom — the sample mean 'uses up' one degree of freedom, leaving n−1 independent deviations.
Bill Smith at Motorola developed Six Sigma — using standard deviation as a quality metric. Achieving 6σ quality means only 3.4 defects per million opportunities. This framework was adopted by GE, Toyota, and hundreds of companies worldwide.
Six Sigma defines process quality as the number of standard deviations between the process mean and the nearest spec limit. At 6σ, only 3.4 defects per million. GE reported $2+ billion in savings from implementing Six Sigma methodologies.
You always divide by n−1 for standard deviation.
Divide by N for population SD (when you have the entire population). Divide by n−1 for sample SD (when estimating from a sample, which is the usual case). Using the wrong denominator introduces systematic bias.