# A tibble: 160 × 2
city spend
<chr> <dbl>
1 Milan 38.8
2 Milan 45.4
3 Milan 81.2
4 Milan 51.4
5 Milan 52.6
6 Milan 84.3
7 Milan 59.2
8 Milan 24.7
9 Milan 36.3
10 Milan 41.1
# ℹ 150 more rows
Laboratory of Statistics and Mathematics 2025/2026
A retail chain asks:
Do customers from different cities spend differently in our stores?
Goal:
Compares averages of 3+ groups.
Does average spending differ across cities?
Intuition: the ANOVA test compares
If between-city variation is “larger” than within-city variation, then group averages differ!
Statistical Hypoteses:
# A tibble: 160 × 2
city spend
<chr> <dbl>
1 Milan 38.8
2 Milan 45.4
3 Milan 81.2
4 Milan 51.4
5 Milan 52.6
6 Milan 84.3
7 Milan 59.2
8 Milan 24.7
9 Milan 36.3
10 Milan 41.1
# ℹ 150 more rows
# A tibble: 160 × 2
city spend
<chr> <dbl>
1 Milan 48.3
2 Milan 49.3
3 Milan 54.7
4 Milan 50.2
5 Milan 50.4
6 Milan 55.2
7 Milan 51.4
8 Milan 46.2
9 Milan 47.9
10 Milan 48.7
# ℹ 150 more rows
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = spend ~ city, data = dfA)
$city
diff lwr upr p adj
Madrid-Berlin 3.24975 -6.9557295 13.455229 0.8416206
Milan-Berlin 0.21175 -9.9937295 10.417229 0.9999436
Paris-Berlin 9.38625 -0.8192295 19.591729 0.0835101
Milan-Madrid -3.03800 -13.2434795 7.167479 0.8665177
Paris-Madrid 6.13650 -4.0689795 16.341979 0.4036555
Paris-Milan 9.17450 -1.0309795 19.379979 0.0946626
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = spend ~ city, data = dfB)
$city
diff lwr upr p adj
Madrid-Berlin 4.4980 2.117168 6.878832 0.0000137
Milan-Berlin 0.1615 -2.219332 2.542332 0.9980504
Paris-Berlin 10.0660 7.685168 12.446832 0.0000000
Milan-Madrid -4.3365 -6.717332 -1.955668 0.0000294
Paris-Madrid 5.5680 3.187168 7.948832 0.0000001
Paris-Milan 9.9045 7.523668 12.285332 0.0000000
lm()
Call:
lm(formula = spend ~ city, data = dfA)
Residuals:
Min 1Q Median 3Q Max
-40.692 -9.652 -1.070 8.530 64.378
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 50.6922 2.7788 18.243 <2e-16 ***
cityMadrid 3.2498 3.9298 0.827 0.4095
cityMilan 0.2118 3.9298 0.054 0.9571
cityParis 9.3862 3.9298 2.388 0.0181 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 17.57 on 156 degrees of freedom
Multiple R-squared: 0.04552, Adjusted R-squared: 0.02716
F-statistic: 2.48 on 3 and 156 DF, p-value: 0.0632
Analysis of Variance Table
Response: spend
Df Sum Sq Mean Sq F value Pr(>F)
city 3 2298 765.88 2.4796 0.0632 .
Residuals 156 48183 308.87
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ANOVA