----------------------------
First problem
----------------------------

> hours=c(2.4,2.7,2.3,2.5,4.6,4.2,4.9,4.7,4.8,4.5,4.4,4.6,5.8,5.2,5.5,5.3,8.9,9.1,8.7,9.0,
+  9.1,9.3,8.7,9.4,6.1,5.7,5.9,6.2,9.9,10.5,10.6,10.1,13.5,13.0,13.3,13.2)
> a=factor(rep(1:3,each=12))
> b=factor(rep(rep(1:3,each=4),3))
> d=data.frame(hours,a,b)
> levels(d$a)=c("low","medium","high")
> levels(d$b)=c("low","medium","high")
> with(d,interactplot(b,a,hours,confidence=0.95))

(a) There appears to be an interaction: the lines are not parallel, even accounting for error in estimating the sample means (the blue error bars).

> f=lm(hours~a*b)
> Anova(f,type=3)
Anova Table (Type III tests)

Response: hours
             Sum Sq Df  F value    Pr(>F)    
(Intercept) 1857.61  1 30864.90 < 2.2e-16 ***
a            220.02  2  1827.86 < 2.2e-16 ***
b            123.66  2  1027.33 < 2.2e-16 ***
a:b           29.42  4   122.23 < 2.2e-16 ***
Residuals      1.63 27 

(b) The interaction "a:b" is highly significant; we cannot drop it from the model.

> pairs(lsmeans(f,"a",by="b"))
b = 1:
 contrast estimate        SE df t.ratio p.value
 1 - 2      -2.975 0.1734722 27 -17.150  <.0001
 1 - 3      -3.500 0.1734722 27 -20.176  <.0001
 2 - 3      -0.525 0.1734722 27  -3.026  0.0144

b = 2:
 contrast estimate        SE df t.ratio p.value
 1 - 2      -4.325 0.1734722 27 -24.932  <.0001
 1 - 3      -5.675 0.1734722 27 -32.714  <.0001
 2 - 3      -1.350 0.1734722 27  -7.782  <.0001

b = 3:
 contrast estimate        SE df t.ratio p.value
 1 - 2      -4.550 0.1734722 27 -26.229  <.0001
 1 - 3      -8.675 0.1734722 27 -50.008  <.0001
 2 - 3      -4.125 0.1734722 27 -23.779  <.0001

P value adjustment: tukey method for comparing a family of 3 estimates 
> confint(pairs(lsmeans(f,"a",by="b")))
b = 1:
 contrast estimate        SE df lower.CL    upper.CL
 1 - 2      -2.975 0.1734722 27 -3.40511 -2.54489004
 1 - 3      -3.500 0.1734722 27 -3.93011 -3.06989004
 2 - 3      -0.525 0.1734722 27 -0.95511 -0.09489004

b = 2:
 contrast estimate        SE df lower.CL    upper.CL
 1 - 2      -4.325 0.1734722 27 -4.75511 -3.89489004
 1 - 3      -5.675 0.1734722 27 -6.10511 -5.24489004
 2 - 3      -1.350 0.1734722 27 -1.78011 -0.91989004

b = 3:
 contrast estimate        SE df lower.CL    upper.CL
 1 - 2      -4.550 0.1734722 27 -4.98011 -4.11989004
 1 - 3      -8.675 0.1734722 27 -9.10511 -8.24489004
 2 - 3      -4.125 0.1734722 27 -4.55511 -3.69489004

Confidence level used: 0.95 
Conf-level adjustment: tukey method for comparing a family of 3 estimates

(c) All differences in A are significant at all levels of B.  

(d) Yes, A and B are synergistic; when both are at "high" they provide more relief than under an additive model would (with parallel lines).  A=high and B=high provides the highest hours of relief, over 12 hours.

> par(mfrow=c(2,2))
> plot(f)

(e) Residuals vs. fitted does indicates roughly constant variance.  The NPP looks reasonably straight so normality seems fine.

----------------------------
Second problem
----------------------------

> d=read.table("http://people.stat.sc.edu/hansont/stat506/electronics.txt",header=F)
> colnames(d)=c("time","gender","sequence","experience","rep")
> d$gender=factor(d$gender)
> d$sequence=factor(d$sequence)
> d$experience=factor(d$experience)
> levels(d$gender)=c("male","female")
> levels(d$sequence)=c("1","2","3")
> levels(d$experience)=c("<18 months",">=18 months")
> f1=lm(time~gender*sequence*experience,data=d)
> Anova(f1,type=3)
Anova Table (Type III tests)

Response: time
                             Sum Sq Df    F value    Pr(>F)    
(Intercept)                67547504  1 78722.8717 < 2.2e-16 ***
gender                       540361  1   629.7603 < 2.2e-16 ***
sequence                      49320  2    28.7396  6.22e-09 ***
experience                   382402  1   445.6679 < 2.2e-16 ***
gender:sequence                 543  2     0.3161    0.7305    
gender:experience                91  1     0.1064    0.7457    
sequence:experience             911  2     0.5310    0.5914    
gender:sequence:experience       19  2     0.0111    0.9890    
Residuals                     41186 48                         
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> f2=lm(time~gender+sequence+experience,data=d)

> anova(f2,f1)
Analysis of Variance Table

Model 1: time ~ gender + sequence + experience
Model 2: time ~ gender * sequence * experience
  Res.Df   RSS Df Sum of Sq      F Pr(>F)
1     55 42750                           
2     48 41186  7      1564 0.2604 0.9662

(a) The ANOVA table on the full model with all interactions shows that we can drop the threeway interaction as well as all twoway interactions.  We test that we can drop all of these simultaneously by fitting the additive model.  The p-value is 0.97 so we accept that the additive models fits adequately.  

> pairwise(f2,sequence) 
                                         
Pairwise comparisons ( hsd ) of sequence 
        estimate signif diff    lower    upper
* 1 - 2   -57.25     21.2363 -78.4863 -36.0137
  1 - 3     6.60     21.2363 -14.6363  27.8363
* 2 - 3    63.85     21.2363  42.6137  85.0863
> lines(pairwise(f2,sequence))
         
3 -23.5 |
1 -16.9 |
2  40.4 

(b) Sequences 3 and 1 take significantly less time than sequence 2, but 3 and 1 are not significantly different.  You could also use Dunnett's procedure here and obtain the same results (as Tukey, above):

> compare.to.best(f2,sequence,lowisbest=TRUE)
          difference allowance
* 2 - 3        63.85  17.24015
  1 - 3         6.60  17.24015
best is 3       0.00        NA

> plot(f2)

(c) Residuals vs. fitted show no evidence on non-constant variance.  NPP looks pretty straight so normality seems ok.

> pairwise(f2,gender) 
                                       
Pairwise comparisons ( hsd ) of gender 
        estimate signif diff    lower    upper
* 1 - 2    189.8    14.42609 175.3739 204.2261

> pairwise(f2,experience) 
                                           
Pairwise comparisons ( hsd ) of experience 
        estimate signif diff    lower    upper
* 1 - 2 159.6667    14.42609 145.2406 174.0928

(d) male=1 and female=2 so we estimate that males take 190 minutes longer than females to assemble 50 boards; the CI does not include zero, so this is a significant difference.  Similarly, inexperienced take about 160 minutes longer to assemble 50 boards relative to experienced, and this is also significant.

----------------------------
Third problem
----------------------------

> reduction=c(0.73,0.86,0.94,1.40,1.62,0.67,0.75,0.81,1.32,1.41,0.15,0.21,0.26,0.75,0.78)
> fat=factor(c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3))
> age=factor(c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5))
> d=data.frame(reduction,fat,age)
> levels(d$age)=c("15-24 yrs","25-34 yrs","35-44 yrs","45-54 yrs","55-64 yrs")
> levels(d$fat)=c("extremely low","fairly low","moderately low")
> par(mfrow=c(1,1))
> with(d,interactplot(age,fat,reduction)) # parallel?
Warning message:
In interactplot(age, fat, reduction) :
  no degrees of freedom for estimating error
 
(a) I think the interaction plot looks reasonably parallel, the "extremely low fat" diet provides the greatest reduction in lipids across the 5 age groups.

> source("http://people.stat.sc.edu/hansont/stat506/tukey.R")
> tukeys.add.test(d$reduction,d$age,d$fat) # accept additive model ok

        Tukey's one df F test for Additivity

data:  d$age and d$fat on d$reduction
F = 6.4453, num df = 1, denom df = 7, p-value = 0.03873
sample estimates:
D estimate 
 0.2723121

(b) Tukey's test rejects additivity with a p-value of 0.04.  We reject additivity at the 5% level.  We will fit the additive model anyway.

> f=lm(reduction~fat+age)
> Anova(f,type=3)
Anova Table (Type III tests)

Response: reduction
             Sum Sq Df F value    Pr(>F)    
(Intercept) 10.6850  1 4424.45 2.904e-12 ***
fat          1.3203  2  273.35 4.326e-08 ***
age          1.4190  4  146.89 1.610e-07 ***
Residuals    0.0193  8                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> pairwise(f,fat)
                                    
Pairwise comparisons ( hsd ) of fat 
        estimate signif diff      lower     upper
* 1 - 2    0.118  0.08881091 0.02918909 0.2068109
* 1 - 3    0.680  0.08881091 0.59118909 0.7688109
* 2 - 3    0.562  0.08881091 0.47318909 0.6508109
> lines(pairwise(f,fat))
          
3 -0.414 |
2  0.148 |
1  0.266 |

(c) There are significant differences in lipid reduction across the three diets.  Specifically, within an age group, "extremely low" sig. reduces lipids 0.12 grams/liter relative to "fairly low" and sig. reduces lipids 0.68 grams/liter relative to "mod. low".  "fairly low" sig. reduces lipids 0.56 grams/liter relative to "mod. low".

> par(mfrow=c(2,2))
> plot(f)

(d) The residuals vs. fitted values shows strong curvature, probably because there is an interaction between age group and diet.  A possible remedy is to acutally fit Tukey's 1-df model and base inference on that, but we did not talk about this in class.  The NPP also appears fairly nonlinear.  A Box-Cox procedure shows a transformation of about lambda=0.7, and the CI does not include 1; this power transformation might also help.  There are nonparametric methods for RCBD, but we didn't talk about them in class; they are available in the "coin" package for R.