hr0 <- read.csv(pathtofile, # must edit pathtofile
skip = 1,
col.names = c("first","second","sex","class"),
colClasses = c("numeric","numeric","factor","factor"))
hr0$hr <- hr0$first + hr0$secondSTAT 516 hw 8
Students in two different classes were asked to measure their heart rates. Each student counted heartbeats during two 30-second periods separated by several minutes and, moreover, recorded his or her sex. It is of interest to see whether the mean resting heart rate is different for men and women, as claimed. Since the data were collected in several classes, the following model is considered: Assume
The code below reads in a the data set HR_sex_combined.csv which can be downloaded here.
To make this homework simpler, work with a balanced version of the data created by the following code, which draws sub-samples of size
# fix the random seed so the same subsets
# are drawn every time the code runs
set.seed(2)
hr0$class_sex <- as.factor(paste(hr0$class,hr0$sex))
classes <- levels(hr0$class_sex)
ab <- length(classes)
n <- 8
hr <- data.frame()
for(i in 1:ab){
ind <- which(hr0$class_sex == classes[i])
hr <- rbind(hr,hr0[sample(ind,n),])
}
head(hr) first second sex class hr class_sex
24 32 31 f STAT_515_sp_2026 63 STAT_515_sp_2026 f
7 44 44 f STAT_515_sp_2026 88 STAT_515_sp_2026 f
28 40 33 f STAT_515_sp_2026 73 STAT_515_sp_2026 f
9 46 39 f STAT_515_sp_2026 85 STAT_515_sp_2026 f
2 38 34 f STAT_515_sp_2026 72 STAT_515_sp_2026 f
19 44 43 f STAT_515_sp_2026 87 STAT_515_sp_2026 f
1.
Make boxplots of the heart rate measurements for all class and sex combinations.
boxplot(hr ~ sex + class, data = hr)2.
Give the values of the
a <- nlevels(hr$sex)
b <- nlevels(hr$class)
y <- hr$hr
y... <- predict(lm(hr ~ 1,data = hr))
yi.. <- predict(lm(hr ~ sex,data = hr))
y.j. <- predict(lm(hr ~ class, data = hr))
SST <- sum((y - y...)**2)
SSA <- sum((yi.. - y...)**2)
SSB <- sum((y.j. - y...)**2)
SSE <- sum((y - (yi.. + y.j. - y...))**2)
MSA <- SSA / (a-1)
MSB <- SSB / (b-1)
MSE <- SSE / (a*b*n - a - b + 1)
FA <- MSA / MSE
FB <- MSB / MSE
pA <- 1 - pf(FA,a-1,a*b*n - a - b + 1)
pB <- 1 - pf(FB,b-1,a*b*n - a - b + 1)
# same as
lm_out <- lm(hr ~ sex + class, data = hr)
anova(lm_out)Analysis of Variance Table
Response: hr
Df Sum Sq Mean Sq F value Pr(>F)
sex 1 175.78 175.781 1.6462 0.2096
class 1 63.28 63.281 0.5926 0.4476
Residuals 29 3096.66 106.781
For i) we obtain an F test statistic value
of 1.646181 with a p value of 0.2096391.
For ii) we obtain an F test statistic value
of 0.5926251 with a p value of 0.4476315.
3.
Obtain the method of moments estimators of the variance components in the model. Report the estimated standard deviations as well as any issues encountered.
sigma_B_sq <- (MSB - MSE) / (n*a)
sigma_e_sq <- MSE
sigma_B <- sqrt(sigma_B_sq)Warning in sqrt(sigma_B_sq): NaNs produced
sigma_e <- sqrt(sigma_e_sq)We have an issue computing the method of
moments estimator for the class effect variance (it is negative).
However, we obtain 10.3335 for the error standard devation.
4.
Give a 95% confidence interval for the difference (women minus men) in mean resting heart rate between the sexes. Construct the interval without using the lmer() function. Give an interpretation of your interval.
yf.. <- mean(y[hr$sex == "f"])
ym.. <- mean(y[hr$sex == "m"])
alpha <- 0.05
tval <- qt(1 - alpha/2,a*b*n - a - b + 1)
me <- tval * sqrt(MSE) * sqrt(2/(n*b))
lo <- yf.. - ym.. - me
up <- yf.. - ym.. + me
c(lo,up)[1] -2.784633 12.159633
A 95% confidence interval for the
difference (women minus men) is [-2.784633,12.15963].
5.
Obtain the REML estimators of the variance components in the model. Report the estimated standard deviations.
library(lmerTest)
lmer_out <- lmer(hr ~ sex + (1|class), data = hr)
lmer_outLinear mixed model fit by REML ['lmerModLmerTest']
Formula: hr ~ sex + (1 | class)
Data: hr
REML criterion at convergence: 230.3948
Random effects:
Groups Name Std.Dev.
class (Intercept) 0.00
Residual 10.26
Number of obs: 32, groups: class, 2
Fixed Effects:
(Intercept) sexm
73.750 -4.687
optimizer (nloptwrap) convergence code: 0 (OK) ; 0 optimizer warnings; 1 lme4 warnings
The REML estimates appear in the output above.
6.
Give a lmer() function. Give an interpretation of your interval.
ci <- ls_means(lmer_out,pairwise=T)
ciLeast Squares Means table:
Estimate Std. Error df t value lower upper Pr(>|t|)
sexf - sexm 4.6875 3.6286 30 1.2918 -2.7230 12.0980 0.2063
Confidence level: 95%
Degrees of freedom method: Satterthwaite
A 95% confidence interval for the
difference (women minus men) is [-2.722996,12.098]. A similar
result is possible if one uses confint() on the output
of the lmer() function.
7.
Check the assumptions of the model.
# obtain fitted values and residuals
yhat <- predict(lmer_out)
ehat <- hr$hr - yhat
# normal QQ plot
qqnorm(scale(ehat))
abline(0,1)# residuals versus fitted values plot
plot(ehat ~ yhat)The normal QQ plot suggests the assumption
of normally distributed error terms is reasonable;
the residuals versus fitted values plot does not show
any dramatic differences in the variances of the
responses in the different groups.