STAT 516 hw 9

Author

Karl Gregory

In this homework you will use data collected by the Developmental Dynamics Lab in the USC School of Public Health, run by Dr. Liz Will, who has kindly allowed me to participate in analyzing some of her data!

Download this data set which is a simplified version of one of her data sets: These data were collected on children with Down syndrome, some having a congenital heart defect (CHD) and some not. Of interest was whether the presence of a CHD tends to affect the developmental trajectory of children with Down syndrome.

The columns in the data set are:

Consider fitting an ANCOVA model to the expressive language scores of the two groups of children—those with Down syndrome and a congenital heart defect (DS+CHD) and those with Down syndrome and no congenital heart defect (DS), using age as a covariate.

The data set can be read into R with the code below:

ds <- read.csv(pathtofile)

1.

Make a scatterplot of the expressive learning scores versus the ages of the children in the study. Use different plotting symbols for the DS+CHD and DS groups. Include a legend.

plot(EL ~ Age, pch = ifelse(CHD == 1, 19, 1),data = ds)
legend('topleft',pch=c(19,1),legend=c("DS + CHD","DS"))

2.

Make another scatterplot as above, but overlay the least-squares line for each group of children. Use different line types and update the legend to show the line types.

lm1 <- lm(EL ~ CHD + Age + CHD:Age, data = ds)
parms1 <- coef(lm1)
plot(EL ~ Age, pch = ifelse(CHD == 1, 19, 1),bty="l",data = ds)
legend('topleft',pch=c(19,1),lty = c(1,2),legend=c("DS + CHD","DS"),bty="n")
abline(parms1[1],parms1[3],lty = 2)
abline(parms1[1] + parms1[2],parms1[3] + parms1[4],lty = 1)

3.

Perform a statistical test checking whether it is necessary to allow different slopes in the two groups. State your conclusion.

library(car)
Anova(lm1,type="III")
Anova Table (Type III tests)

Response: EL
             Sum Sq Df F value  Pr(>F)  
(Intercept)   0.682  1  0.1651 0.68660  
CHD           3.919  1  0.9483 0.33573  
Age          24.951  1  6.0369 0.01822 *
CHD:Age       3.942  1  0.9538 0.33434  
Residuals   173.591 42                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(lm1)

Call:
lm(formula = EL ~ CHD + Age + CHD:Age, data = ds)

Residuals:
    Min      1Q  Median      3Q     Max 
-6.5353 -1.1470  0.4890  0.9692  3.9113 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)   1.0005     2.4627   0.406   0.6866  
CHD           3.6831     3.7822   0.974   0.3357  
Age           0.4958     0.2018   2.457   0.0182 *
CHD:Age      -0.2930     0.3000  -0.977   0.3343  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.033 on 42 degrees of freedom
Multiple R-squared:  0.1458,    Adjusted R-squared:  0.08477 
F-statistic: 2.389 on 3 and 42 DF,  p-value: 0.08228
The p value for the interaction effect in 
either of the above sets of output is large, so that we 
fail to reject the null hypothesis of equal slopes.

4.

Proceed assuming that the slope is the same in the two groups. Make once more a scatterplot of the expressive learning score versus age as before, but this time overlay the lines for the two groups fitted under the assumption of equal slopes. Include a legend.

lm2 <- lm(EL ~ CHD + Age, data = ds)
parms2 <- coef(lm2)
plot(EL ~ Age, pch = ifelse(CHD == 1, 19, 1), bty = "l",data = ds)
legend('topleft',pch=c(19,1),lty = c(1,2),legend=c("DS + CHD","DS"),bty="n")
abline(parms2[1],parms2[3],lty = 2)
abline(parms2[1] + parms2[2],parms2[3],lty = 1)

5.

Check whether the assumptions of the equal-slopes ANCOVA model are satisfied.

plot(lm2,which = 1,add.smooth=F)

plot(lm2,which = 2)

The residuals versus fitted values plot and the normal QQ plot don't look too bad.

6.

Using the equal-slopes ANCOVA model, test i) whether the age covariate contributes significantly to the variation in the expressive language scores and ii), having adjusted for age, whether there is a statistically significant difference in the mean expressive language scores of children in the DS+CHD and DS groups.

Anova(lm2,type="III")
Anova Table (Type III tests)

Response: EL
             Sum Sq Df F value  Pr(>F)  
(Intercept)   8.143  1  1.9724 0.16738  
CHD           0.016  1  0.0039 0.95068  
Age          24.459  1  5.9242 0.01916 *
Residuals   177.533 43                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We obtain a small p value for testing 
whether the slope coefficient is equal to zero, suggesting 
that the age covariate significantly contributes to 
the variation in the expressive language scores.  
On the other hand, we obtain a large p value when 
testing whether there is any effect of a congenital 
heart defect, so there is insufficient evidence in 
the data, after accounting for the ages of the children, 
to conclude that mean expressive learning scores 
differ among children in the DS+CHD and the DS group.

7.

Using the model with equal slopes in the two groups, report the age-adjusted means (adjusted to the mean age of all the children in the data set) along with 95% confidence intervals.

N <- nrow(ds)
a <- 2
dferror <- N - a - 1
MSerror <- sum(lm2$residuals**2) / dferror

n1 <- sum(ds$CHD==1)
n2 <- sum(ds$CHD==0)
x <- ds$Age
x1. <- mean(x[ds$CHD == 1]) # with CHD
x2. <- mean(x[ds$CHD == 0])
x.. <- mean(x)
Exx <- sum((x - x..)**2)

y <- ds$EL
y1. <- mean(y[ds$CHD == 1])
y2. <- mean(y[ds$CHD == 0])

bhat <- parms2[3]

alpha <- 0.05
tval <- qt(1 - alpha/2,dferror)
me1 <- tval * sqrt(MSerror) * sqrt(1/n1 + (x1. - x..)**2/Exx)
me2 <- tval * sqrt(MSerror) * sqrt(1/n2 + (x2. - x..)**2/Exx)

y1.adj <- y1. - bhat*(x1. - x..)
y2.adj <- y2. - bhat*(x2. - x..)

lo1 <- y1.adj - tval * me1
up1 <- y1.adj + tval * me1

lo2 <- y2.adj - tval * me2
up2 <- y2.adj + tval * me2
The estimated means after 
adjustment to the overall 
mean age of 12.43739 months were 7.147849 for the DS + CHD 
group and 7.109703 for the DS group, with 95% confidence 
intervals  [5.480935,8.814763] and 
[5.288028,8.931379], respectively.

8.

Again using the equal-slopes model, obtain a 95% confidence interval for the difference in age-adjusted means. Give an interpretation of your interval.

me12 <- tval * sqrt(MSerror) * sqrt(1/n1 + 1/n2 + (x1. - x2.)**2/Exx)
lo12 <- y1.adj - y2.adj - tval * me12
up12 <- y1.adj - y2.adj + tval * me12
The estimated difference in the means after 
adjustment to the overall mean age of 12.43739 
months is 0.03814577, and a 95% confidence 
interval for the true difference in 
age-adjusted means is [-2.453607,2.529898].  Since the interval 
includes zero, we cannot conclude that there is any difference, after 
adjusting for age, between the means of the  expressive language 
scores of children in the DS+CHD 
and DS groups.