ds <- read.csv(pathtofile)STAT 516 hw 9
In this homework you will use data collected by the Developmental Dynamics Lab in the USC School of Public Health, run by Dr. Liz Will, who has kindly allowed me to participate in analyzing some of her data!
Download this data set which is a simplified version of one of her data sets: These data were collected on children with Down syndrome, some having a congenital heart defect (CHD) and some not. Of interest was whether the presence of a CHD tends to affect the developmental trajectory of children with Down syndrome.
The columns in the data set are:
- id: An identifier for the child
- CHD: An indicator of whether the child was born with a CHD (1 = yes, 0 = no)
- Age: The age of the child
- GM: Gross motor score
- FM: Fine motor score
- VR: Visual reception score
- RL: Receptive language score
- EL: Expressive language score
Consider fitting an ANCOVA model to the expressive language scores of the two groups of children—those with Down syndrome and a congenital heart defect (DS+CHD) and those with Down syndrome and no congenital heart defect (DS), using age as a covariate.
The data set can be read into R with the code below:
1.
Make a scatterplot of the expressive learning scores versus the ages of the children in the study. Use different plotting symbols for the DS+CHD and DS groups. Include a legend.
plot(EL ~ Age, pch = ifelse(CHD == 1, 19, 1),data = ds)
legend('topleft',pch=c(19,1),legend=c("DS + CHD","DS"))2.
Make another scatterplot as above, but overlay the least-squares line for each group of children. Use different line types and update the legend to show the line types.
lm1 <- lm(EL ~ CHD + Age + CHD:Age, data = ds)
parms1 <- coef(lm1)
plot(EL ~ Age, pch = ifelse(CHD == 1, 19, 1),bty="l",data = ds)
legend('topleft',pch=c(19,1),lty = c(1,2),legend=c("DS + CHD","DS"),bty="n")
abline(parms1[1],parms1[3],lty = 2)
abline(parms1[1] + parms1[2],parms1[3] + parms1[4],lty = 1)3.
Perform a statistical test checking whether it is necessary to allow different slopes in the two groups. State your conclusion.
library(car)
Anova(lm1,type="III")Anova Table (Type III tests)
Response: EL
Sum Sq Df F value Pr(>F)
(Intercept) 0.682 1 0.1651 0.68660
CHD 3.919 1 0.9483 0.33573
Age 24.951 1 6.0369 0.01822 *
CHD:Age 3.942 1 0.9538 0.33434
Residuals 173.591 42
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(lm1)
Call:
lm(formula = EL ~ CHD + Age + CHD:Age, data = ds)
Residuals:
Min 1Q Median 3Q Max
-6.5353 -1.1470 0.4890 0.9692 3.9113
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.0005 2.4627 0.406 0.6866
CHD 3.6831 3.7822 0.974 0.3357
Age 0.4958 0.2018 2.457 0.0182 *
CHD:Age -0.2930 0.3000 -0.977 0.3343
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.033 on 42 degrees of freedom
Multiple R-squared: 0.1458, Adjusted R-squared: 0.08477
F-statistic: 2.389 on 3 and 42 DF, p-value: 0.08228
The p value for the interaction effect in
either of the above sets of output is large, so that we
fail to reject the null hypothesis of equal slopes.
4.
Proceed assuming that the slope is the same in the two groups. Make once more a scatterplot of the expressive learning score versus age as before, but this time overlay the lines for the two groups fitted under the assumption of equal slopes. Include a legend.
lm2 <- lm(EL ~ CHD + Age, data = ds)
parms2 <- coef(lm2)
plot(EL ~ Age, pch = ifelse(CHD == 1, 19, 1), bty = "l",data = ds)
legend('topleft',pch=c(19,1),lty = c(1,2),legend=c("DS + CHD","DS"),bty="n")
abline(parms2[1],parms2[3],lty = 2)
abline(parms2[1] + parms2[2],parms2[3],lty = 1)5.
Check whether the assumptions of the equal-slopes ANCOVA model are satisfied.
plot(lm2,which = 1,add.smooth=F)plot(lm2,which = 2)The residuals versus fitted values plot and the normal QQ plot don't look too bad.
6.
Using the equal-slopes ANCOVA model, test i) whether the age covariate contributes significantly to the variation in the expressive language scores and ii), having adjusted for age, whether there is a statistically significant difference in the mean expressive language scores of children in the DS+CHD and DS groups.
Anova(lm2,type="III")Anova Table (Type III tests)
Response: EL
Sum Sq Df F value Pr(>F)
(Intercept) 8.143 1 1.9724 0.16738
CHD 0.016 1 0.0039 0.95068
Age 24.459 1 5.9242 0.01916 *
Residuals 177.533 43
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We obtain a small p value for testing
whether the slope coefficient is equal to zero, suggesting
that the age covariate significantly contributes to
the variation in the expressive language scores.
On the other hand, we obtain a large p value when
testing whether there is any effect of a congenital
heart defect, so there is insufficient evidence in
the data, after accounting for the ages of the children,
to conclude that mean expressive learning scores
differ among children in the DS+CHD and the DS group.
7.
Using the model with equal slopes in the two groups, report the age-adjusted means (adjusted to the mean age of all the children in the data set) along with
N <- nrow(ds)
a <- 2
dferror <- N - a - 1
MSerror <- sum(lm2$residuals**2) / dferror
n1 <- sum(ds$CHD==1)
n2 <- sum(ds$CHD==0)
x <- ds$Age
x1. <- mean(x[ds$CHD == 1]) # with CHD
x2. <- mean(x[ds$CHD == 0])
x.. <- mean(x)
Exx <- sum((x - x..)**2)
y <- ds$EL
y1. <- mean(y[ds$CHD == 1])
y2. <- mean(y[ds$CHD == 0])
bhat <- parms2[3]
alpha <- 0.05
tval <- qt(1 - alpha/2,dferror)
me1 <- tval * sqrt(MSerror) * sqrt(1/n1 + (x1. - x..)**2/Exx)
me2 <- tval * sqrt(MSerror) * sqrt(1/n2 + (x2. - x..)**2/Exx)
y1.adj <- y1. - bhat*(x1. - x..)
y2.adj <- y2. - bhat*(x2. - x..)
lo1 <- y1.adj - tval * me1
up1 <- y1.adj + tval * me1
lo2 <- y2.adj - tval * me2
up2 <- y2.adj + tval * me2The estimated means after
adjustment to the overall
mean age of 12.43739 months were 7.147849 for the DS + CHD
group and 7.109703 for the DS group, with 95% confidence
intervals [5.480935,8.814763] and
[5.288028,8.931379], respectively.
8.
Again using the equal-slopes model, obtain a 95% confidence interval for the difference in age-adjusted means. Give an interpretation of your interval.
me12 <- tval * sqrt(MSerror) * sqrt(1/n1 + 1/n2 + (x1. - x2.)**2/Exx)
lo12 <- y1.adj - y2.adj - tval * me12
up12 <- y1.adj - y2.adj + tval * me12The estimated difference in the means after
adjustment to the overall mean age of 12.43739
months is 0.03814577, and a 95% confidence
interval for the true difference in
age-adjusted means is [-2.453607,2.529898]. Since the interval
includes zero, we cannot conclude that there is any difference, after
adjusting for age, between the means of the expressive language
scores of children in the DS+CHD
and DS groups.