STAT 516 hw 8

Author

Karl Gregory

Students in two different classes were asked to measure their heart rates. Each student counted heartbeats during two 30-second periods separated by several minutes and, moreover, recorded his or her sex. It is of interest to see whether the mean resting heart rate is different for men and women, as claimed. Since the data were collected in several classes, the following model is considered: Assume $Y_{i j k} = μ + τ_{i} + B_{j} + ε_{i j k},$ where $Y_{i j k}$ represents the heart rate (the sum of the two 30-second counts) for student $k$ of sex $i$ in class $j$ , $τ_{i}$ is the effect of sex $i$ , the $B_{j}$ are class effects assumed to be independent $N (0, σ_{B}^{2})$ random variables, and the $ε_{i j k}$ are independent error terms having the $N (0, σ_{ε}^{2})$ distribution for $i = 1, 2$ , $j = 1, 2$ , and $k = 1, \dots, n_{i j}$ . This is a randomized complete block design with replication—where “with replication” means that there is more than one experimental unit at each combination of block and factor level. No interaction term is included, because we have for this data set $a = 2$ and $b = 2$ , so that the interaction term would have degrees of freedom $(a - 1) (b - 1) = 1$ . This means that we would have to estimate the variance of the random interaction effect based on a single random realization; as it is not advisable to estimate a variance based on a single observation, we leave the interaction term out.

The code below reads in a the data set HR_sex_combined.csv which can be downloaded here.

hr0 <- read.csv(pathtofile, # must edit pathtofile
                skip = 1,
                col.names = c("first","second","sex","class"),
                colClasses = c("numeric","numeric","factor","factor"))

hr0$hr <- hr0$first + hr0$second

To make this homework simpler, work with a balanced version of the data created by the following code, which draws sub-samples of size $n = 8$ from each class and sex combination.

# fix the random seed so the same subsets 
# are drawn every time the code runs
set.seed(2) 

hr0$class_sex <- as.factor(paste(hr0$class,hr0$sex))
classes <- levels(hr0$class_sex)
ab <- length(classes)
n <- 8
hr <- data.frame()
for(i in 1:ab){
  
  ind <- which(hr0$class_sex == classes[i])
  hr <- rbind(hr,hr0[sample(ind,n),])  
  
}
head(hr)

   first second sex            class hr          class_sex
24    32     31   f STAT_515_sp_2026 63 STAT_515_sp_2026 f
7     44     44   f STAT_515_sp_2026 88 STAT_515_sp_2026 f
28    40     33   f STAT_515_sp_2026 73 STAT_515_sp_2026 f
9     46     39   f STAT_515_sp_2026 85 STAT_515_sp_2026 f
2     38     34   f STAT_515_sp_2026 72 STAT_515_sp_2026 f
19    44     43   f STAT_515_sp_2026 87 STAT_515_sp_2026 f

1.

Make boxplots of the heart rate measurements for all class and sex combinations.

boxplot(hr ~ sex + class, data = hr)

2.

Give the values of the $F$ test statistics and $p$ values for testing the null hypotheses of i) no difference in mean heart rate between the sexes and ii) no significant variation between classes. Interpret your findings based on i) and ii) assuming that the assumptions of the model are met.

a <- nlevels(hr$sex)
b <- nlevels(hr$class)

y <- hr$hr
y... <- predict(lm(hr ~ 1,data = hr))
yi.. <- predict(lm(hr ~ sex,data = hr))
y.j. <- predict(lm(hr ~ class, data = hr))

SST <- sum((y - y...)**2)
SSA <- sum((yi.. - y...)**2)
SSB <- sum((y.j. - y...)**2)
SSE <- sum((y - (yi.. + y.j. - y...))**2)

MSA <- SSA / (a-1)
MSB <- SSB / (b-1)
MSE <- SSE / (a*b*n - a - b + 1)

FA <- MSA / MSE
FB <- MSB / MSE

pA <- 1 - pf(FA,a-1,a*b*n - a - b + 1)
pB <- 1 - pf(FB,b-1,a*b*n - a - b + 1)

# same as
lm_out <- lm(hr ~ sex + class, data = hr)
anova(lm_out)

Analysis of Variance Table

Response: hr
          Df  Sum Sq Mean Sq F value Pr(>F)
sex        1  175.78 175.781  1.6462 0.2096
class      1   63.28  63.281  0.5926 0.4476
Residuals 29 3096.66 106.781

For i) we obtain an F test statistic value 
of 1.646181 with a p value of 0.2096391.

For ii) we obtain an F test statistic value 
of 0.5926251 with a p value of 0.4476315.

3.

Obtain the method of moments estimators of the variance components in the model. Report the estimated standard deviations as well as any issues encountered.

sigma_B_sq <- (MSB - MSE) / (n*a)
sigma_e_sq <- MSE

sigma_B <- sqrt(sigma_B_sq)

Warning in sqrt(sigma_B_sq): NaNs produced

sigma_e <- sqrt(sigma_e_sq)

We have an issue computing the method of 
moments estimator for the class effect variance (it is negative). 
However, we obtain 10.3335 for the error standard devation.

4.

Give a 95% confidence interval for the difference (women minus men) in mean resting heart rate between the sexes. Construct the interval without using the lmer() function. Give an interpretation of your interval.

yf.. <- mean(y[hr$sex == "f"])
ym.. <- mean(y[hr$sex == "m"])
alpha <- 0.05
tval <- qt(1 - alpha/2,a*b*n - a - b + 1)
me <- tval * sqrt(MSE) * sqrt(2/(n*b))
lo <- yf.. - ym.. - me
up <- yf.. - ym.. + me
c(lo,up)

[1] -2.784633 12.159633

A 95% confidence interval for the 
difference (women minus men) is [-2.784633,12.15963].

5.

Obtain the REML estimators of the variance components in the model. Report the estimated standard deviations.

library(lmerTest)
lmer_out <- lmer(hr ~ sex + (1|class), data = hr)
lmer_out

Linear mixed model fit by REML ['lmerModLmerTest']
Formula: hr ~ sex + (1 | class)
   Data: hr
REML criterion at convergence: 230.3948
Random effects:
 Groups   Name        Std.Dev.
 class    (Intercept)  0.00   
 Residual             10.26   
Number of obs: 32, groups:  class, 2
Fixed Effects:
(Intercept)         sexm  
     73.750       -4.687  
optimizer (nloptwrap) convergence code: 0 (OK) ; 0 optimizer warnings; 1 lme4 warnings

The REML estimates appear in the output above.

6.

Give a $95 %$ confidence interval for the difference in mean resting heart rate between men and women (women minus men) using the output of the lmer() function. Give an interpretation of your interval.

ci <- ls_means(lmer_out,pairwise=T)
ci

Least Squares Means table:

            Estimate Std. Error df t value   lower   upper Pr(>|t|)
sexf - sexm   4.6875     3.6286 30  1.2918 -2.7230 12.0980   0.2063

  Confidence level: 95%
  Degrees of freedom method: Satterthwaite

A 95% confidence interval for the 
difference (women minus men) is [-2.722996,12.098]. A similar 
result is possible if one uses confint() on the output 
of the lmer() function.

7.

Check the assumptions of the model.

# obtain fitted values and residuals
yhat <- predict(lmer_out)
ehat <- hr$hr - yhat

# normal QQ plot
qqnorm(scale(ehat))
abline(0,1)

# residuals versus fitted values plot
plot(ehat ~ yhat)

The normal QQ plot suggests the assumption 
of normally distributed error terms is reasonable; 
the residuals versus fitted values plot does not show 
any dramatic differences in the variances of the 
responses in the different groups.