STAT 516 hw 6

Author

Karl Gregory

Thirty sheets of paper were printed, each with one of the following instructions printed on top, such that each instruction appeared on five sheets of paper:

“Draw a triangle”
“Draw any triangle”
“Draw literally any triangle”
“Draw a three-sided shape”
“Draw any three-sided shape”
“Draw literally any three-sided shape”

The sheets of paper were stacked such that the instructions were inserted in the above order, cyclically, five times in the stack, and the sheets were handed out one-by-one to students in a class, such that the desk at which a student sat determined which instruction he or she was given. After following the instructions, the students were asked to measure the lengths of the sides ( $a$ , $b$ , and $c$ ) of their shapes and record them. Twenty-two students attended the class, which resulted in an unbalanced design; to make the design balanced, two statistics graduate students were accosted in the hallway and asked to participate.

The data are in the file triangle_threesided_shape_draw.csv, which can be downloaded from this folder. The R code below will prepare the data for analysis: It will compute on each set of side lengths the value $d$ , where $d^{2}$ is defined as $d^{2} = \frac{(a - b)^{2} + (b - c)^{2} + (a - c)^{2}}{3 (a^{2} + b^{2} + c^{2})} .$ Thus $d$ will be equal to zero if a triangle has equal sides ( $a = b = c$ ), so triangles similar to equilateral triangles will have small values of $d$ . The values of $d$ will be used as the response in a two-way ANOVA model.

tri0 <- read.csv(pathtofile) # replace pathtofile 

a <- tri0$a
b <- tri0$b
c <- tri0$c
d <- sqrt(((a - b)^2 + (b - c)^2 + (a - c)^2 ) / (3*(a^2 + b^2 + c^2)))
tri <- data.frame(d,
                  F1 = as.factor(ifelse(tri0$F1=="threesided","thr","tri")),
                  F2 = as.factor(tri0$F2))
tri

             d  F1     F2
1  0.060070625 thr litany
2  0.007850687 tri    any
3  0.214201374 tri litany
4  0.060739671 tri      a
5  0.282573723 tri    any
6  0.021777216 tri      a
7  0.177144244 thr litany
8  0.000000000 tri litany
9  0.017674908 thr      a
10 0.090882270 thr    any
11 0.197750574 tri litany
12 0.161201469 thr      a
13 0.028849488 thr    any
14 0.122755668 thr litany
15 0.174077656 thr      a
16 0.082902667 thr    any
17 0.097071368 thr      a
18 0.290890942 thr    any
19 0.197401838 thr litany
20 0.159156635 tri litany
21 0.336934423 tri    any
22 0.000000000 tri      a
23 0.222756921 tri      a
24 0.179341061 tri    any

Consider fitting the two-way treatment effects model which assumes the responses arise as $Y_{i j k} = μ + τ_{i} + γ_{j} + (τ γ)_{i j} + ε_{i j k}$ for $k = 1, \dots, n_{i j}$ , $i = 1, \dots, a$ , $j = 1, \dots, b$ , where $ε_{i j k}$ are independent error terms having the $N (0, σ^{2})$ distribution.

1.

What are the two factors in the experiment and how many levels does each have?

The first factor is whether 'triangle' or 'three-sided' 
shape appears in the instructions; this factor has two levels. The 
second factor is whether 'a', 'any', or 'literally any' is 
used; this factor has three levels.

2.

Make side-by-side boxplots of the responses at all factor level combinations.

boxplot(d~F1:F2,data = tri)

table(tri$F1,tri$F2)

     
      a any litany
  thr 4   4      4
  tri 4   4      4

3.

Give the output of table(tri$F1,tri$F2) and explain what it shows.

table(tri$F1,tri$F2)

     
      a any litany
  thr 4   4      4
  tri 4   4      4

This shows that there are n = 4 replicates 
in each treatment group combination.

4.

Give the means of the responses at all factor level combinations.

aggregate(d ~ F1 + F2, mean,data = tri)

   F1     F2          d
1 thr      a 0.11250635
2 tri      a 0.07631845
3 thr    any 0.12338134
4 tri    any 0.20167497
5 thr litany 0.13934309
6 tri litany 0.14277715

5.

Give $\hat{σ}$ , the estimate of the standard deviation $σ$ of the error terms.

lm_out <- lm(d ~ F1 + F2 + F1:F2,data = tri)
a <- 2
b <- 3
n <- 4
ehat <- lm_out$residuals
SSerror <- sum(ehat**2)
MSerror <- SSerror/(a*b*(n-1))
sigma_hat <- sqrt(MSerror)

We obtain the value 0.1023644.

6.

Check whether the assumptions of the two-way ANOVA model are satisfied.

plot(lm_out,which = 1)

lm_levene <- lm(abs(ehat) ~ F1 + F2 + F1:F2,data = tri)
summary(lm_levene)


Call:
lm(formula = abs(ehat) ~ F1 + F2 + F1:F2, data = tri)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.085745 -0.033431 -0.003201  0.028221  0.085745 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)  
(Intercept)     0.055133   0.026523   2.079   0.0522 .
F1tri           0.018086   0.037510   0.482   0.6355  
F2any           0.028622   0.037510   0.763   0.4553  
F2litany       -0.007203   0.037510  -0.192   0.8499  
F1tri:F2any     0.006238   0.053047   0.118   0.9077  
F1tri:F2litany  0.005373   0.053047   0.101   0.9204  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.05305 on 18 degrees of freedom
Multiple R-squared:  0.1535,    Adjusted R-squared:  -0.08164 
F-statistic: 0.6528 on 5 and 18 DF,  p-value: 0.6633

The residuals versus fitted values plot 
shows fairly equal residual spreads in all treatment groups; 
moreover, Levene's test fails to reject the null hypothesis of 
equal variances (large p-value in above output). Therefore it 
seems we can assume the variance of the response to be equal 
in all treatment groups.

plot(lm_out,which = 2)

The points in the normal QQ plot of the 
residuals line close to a straigh line, suggesting it is safe 
to assume that the responses are normally distributed around 
the treatment group means.

7.

Give the value of the test statistic for the overall $F$ test and the associated $p$ value.

dbar <- mean(d)
SStot <- sum((d - dbar)**2)
SStrt <- SStot - SSerror
MStrt <- SStrt / (a*b - 1)
Ftest <- MStrt / MSerror
pval <- 1 - pf(Ftest,a*b-1,a*b*(n-1))

The value of the test statistic is 0.6548065 
and the p value is 0.6618499.

8.

Obtain the complete ANOVA table containing the $F$ test statistics and associated $p$ values for testing the significance of the main effects as well as the interaction effect.

anova(lm_out)

Analysis of Variance Table

Response: d
          Df   Sum Sq   Mean Sq F value Pr(>F)
F1         1 0.001383 0.0013826  0.1319 0.7207
F2         2 0.019404 0.0097022  0.9259 0.4142
F1:F2      2 0.013520 0.0067600  0.6451 0.5363
Residuals 18 0.188613 0.0104785

9.

Generate interaction plots for the two factors. State whether you believe there is an interaction between the two factors.

interaction.plot(tri$F1,tri$F2,tri$d)

interaction.plot(tri$F2,tri$F1,tri$d)

The p value for the interaction effect is 
quite large; therefore, even though the interaction plots 
show crossing lines, the crossing is most likely due to random 
noise in the data and not due to any real interaction between 
the two factors.

10.

State whether you believe there is a significant main effect associated with any of the two factors.

Since the p value for the interaction is 
large, we can interpret the p values for the main effects; 
since these are also large, we can say there appears not to be 
any significant interaction effect or any significant main 
effects of the factors.

11.

Use Dunnett’s method to compare the means at all factor level combinations to that of the “Draw a triangle” group. Report the confidence intervals for the differences in means and interpret them.

tri$F1F2 <- as.factor(paste(tri$F1,tri$F2,sep="_"))
library(DescTools)

Warning: package 'DescTools' was built under R version 4.4.1

DunnettTest(tri$d ~ tri$F1F2, control="tri_a",conf.level = 0.95)


  Dunnett's test for comparing several treatments with a control :  
    95% family-wise confidence level

$tri_a
                       diff      lwr.ci    upr.ci   pval    
thr_a-tri_a      0.03618790 -0.16374727 0.2361231 0.9813    
thr_any-tri_a    0.04706289 -0.15287228 0.2469981 0.9460    
thr_litany-tri_a 0.06302464 -0.13691053 0.2629598 0.8505    
tri_any-tri_a    0.12535652 -0.07457865 0.3252917 0.3199    
tri_litany-tri_a 0.06645869 -0.13347648 0.2663939 0.8238    

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Even though the 'Draw a triangle' group 
had the smallest mean, and though it is the prompt which, one 
might think, would be most likely to elicit an equilateral triangle, 
the data do not show that the triangles elicited by this 
instruction were statistically significantly closer to 
equilateral than those elicited by the other instructions, as 
all the Dunnett's confidence intervals contain zero.