---------------------------- First problem ---------------------------- > hours=c(2.4,2.7,2.3,2.5,4.6,4.2,4.9,4.7,4.8,4.5,4.4,4.6,5.8,5.2,5.5,5.3,8.9,9.1,8.7,9.0, + 9.1,9.3,8.7,9.4,6.1,5.7,5.9,6.2,9.9,10.5,10.6,10.1,13.5,13.0,13.3,13.2) > a=factor(rep(1:3,each=12)) > b=factor(rep(rep(1:3,each=4),3)) > d=data.frame(hours,a,b) > levels(d$a)=c("low","medium","high") > levels(d$b)=c("low","medium","high") > with(d,interactplot(b,a,hours,confidence=0.95)) (a) There appears to be an interaction: the lines are not parallel, even accounting for error in estimating the sample means (the blue error bars). > f=lm(hours~a*b) > Anova(f,type=3) Anova Table (Type III tests) Response: hours Sum Sq Df F value Pr(>F) (Intercept) 1857.61 1 30864.90 < 2.2e-16 *** a 220.02 2 1827.86 < 2.2e-16 *** b 123.66 2 1027.33 < 2.2e-16 *** a:b 29.42 4 122.23 < 2.2e-16 *** Residuals 1.63 27 (b) The interaction "a:b" is highly significant; we cannot drop it from the model. > pairs(lsmeans(f,"a",by="b")) b = 1: contrast estimate SE df t.ratio p.value 1 - 2 -2.975 0.1734722 27 -17.150 <.0001 1 - 3 -3.500 0.1734722 27 -20.176 <.0001 2 - 3 -0.525 0.1734722 27 -3.026 0.0144 b = 2: contrast estimate SE df t.ratio p.value 1 - 2 -4.325 0.1734722 27 -24.932 <.0001 1 - 3 -5.675 0.1734722 27 -32.714 <.0001 2 - 3 -1.350 0.1734722 27 -7.782 <.0001 b = 3: contrast estimate SE df t.ratio p.value 1 - 2 -4.550 0.1734722 27 -26.229 <.0001 1 - 3 -8.675 0.1734722 27 -50.008 <.0001 2 - 3 -4.125 0.1734722 27 -23.779 <.0001 P value adjustment: tukey method for comparing a family of 3 estimates > confint(pairs(lsmeans(f,"a",by="b"))) b = 1: contrast estimate SE df lower.CL upper.CL 1 - 2 -2.975 0.1734722 27 -3.40511 -2.54489004 1 - 3 -3.500 0.1734722 27 -3.93011 -3.06989004 2 - 3 -0.525 0.1734722 27 -0.95511 -0.09489004 b = 2: contrast estimate SE df lower.CL upper.CL 1 - 2 -4.325 0.1734722 27 -4.75511 -3.89489004 1 - 3 -5.675 0.1734722 27 -6.10511 -5.24489004 2 - 3 -1.350 0.1734722 27 -1.78011 -0.91989004 b = 3: contrast estimate SE df lower.CL upper.CL 1 - 2 -4.550 0.1734722 27 -4.98011 -4.11989004 1 - 3 -8.675 0.1734722 27 -9.10511 -8.24489004 2 - 3 -4.125 0.1734722 27 -4.55511 -3.69489004 Confidence level used: 0.95 Conf-level adjustment: tukey method for comparing a family of 3 estimates (c) All differences in A are significant at all levels of B. (d) Yes, A and B are synergistic; when both are at "high" they provide more relief than under an additive model would (with parallel lines). A=high and B=high provides the highest hours of relief, over 12 hours. > par(mfrow=c(2,2)) > plot(f) (e) Residuals vs. fitted does indicates roughly constant variance. The NPP looks reasonably straight so normality seems fine. ---------------------------- Second problem ---------------------------- > d=read.table("http://people.stat.sc.edu/hansont/stat506/electronics.txt",header=F) > colnames(d)=c("time","gender","sequence","experience","rep") > d$gender=factor(d$gender) > d$sequence=factor(d$sequence) > d$experience=factor(d$experience) > levels(d$gender)=c("male","female") > levels(d$sequence)=c("1","2","3") > levels(d$experience)=c("<18 months",">=18 months") > f1=lm(time~gender*sequence*experience,data=d) > Anova(f1,type=3) Anova Table (Type III tests) Response: time Sum Sq Df F value Pr(>F) (Intercept) 67547504 1 78722.8717 < 2.2e-16 *** gender 540361 1 629.7603 < 2.2e-16 *** sequence 49320 2 28.7396 6.22e-09 *** experience 382402 1 445.6679 < 2.2e-16 *** gender:sequence 543 2 0.3161 0.7305 gender:experience 91 1 0.1064 0.7457 sequence:experience 911 2 0.5310 0.5914 gender:sequence:experience 19 2 0.0111 0.9890 Residuals 41186 48 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > f2=lm(time~gender+sequence+experience,data=d) > anova(f2,f1) Analysis of Variance Table Model 1: time ~ gender + sequence + experience Model 2: time ~ gender * sequence * experience Res.Df RSS Df Sum of Sq F Pr(>F) 1 55 42750 2 48 41186 7 1564 0.2604 0.9662 (a) The ANOVA table on the full model with all interactions shows that we can drop the threeway interaction as well as all twoway interactions. We test that we can drop all of these simultaneously by fitting the additive model. The p-value is 0.97 so we accept that the additive models fits adequately. > pairwise(f2,sequence) Pairwise comparisons ( hsd ) of sequence estimate signif diff lower upper * 1 - 2 -57.25 21.2363 -78.4863 -36.0137 1 - 3 6.60 21.2363 -14.6363 27.8363 * 2 - 3 63.85 21.2363 42.6137 85.0863 > lines(pairwise(f2,sequence)) 3 -23.5 | 1 -16.9 | 2 40.4 (b) Sequences 3 and 1 take significantly less time than sequence 2, but 3 and 1 are not significantly different. You could also use Dunnett's procedure here and obtain the same results (as Tukey, above): > compare.to.best(f2,sequence,lowisbest=TRUE) difference allowance * 2 - 3 63.85 17.24015 1 - 3 6.60 17.24015 best is 3 0.00 NA > plot(f2) (c) Residuals vs. fitted show no evidence on non-constant variance. NPP looks pretty straight so normality seems ok. > pairwise(f2,gender) Pairwise comparisons ( hsd ) of gender estimate signif diff lower upper * 1 - 2 189.8 14.42609 175.3739 204.2261 > pairwise(f2,experience) Pairwise comparisons ( hsd ) of experience estimate signif diff lower upper * 1 - 2 159.6667 14.42609 145.2406 174.0928 (d) male=1 and female=2 so we estimate that males take 190 minutes longer than females to assemble 50 boards; the CI does not include zero, so this is a significant difference. Similarly, inexperienced take about 160 minutes longer to assemble 50 boards relative to experienced, and this is also significant. ---------------------------- Third problem ---------------------------- > reduction=c(0.73,0.86,0.94,1.40,1.62,0.67,0.75,0.81,1.32,1.41,0.15,0.21,0.26,0.75,0.78) > fat=factor(c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3)) > age=factor(c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5)) > d=data.frame(reduction,fat,age) > levels(d$age)=c("15-24 yrs","25-34 yrs","35-44 yrs","45-54 yrs","55-64 yrs") > levels(d$fat)=c("extremely low","fairly low","moderately low") > par(mfrow=c(1,1)) > with(d,interactplot(age,fat,reduction)) # parallel? Warning message: In interactplot(age, fat, reduction) : no degrees of freedom for estimating error (a) I think the interaction plot looks reasonably parallel, the "extremely low fat" diet provides the greatest reduction in lipids across the 5 age groups. > source("http://people.stat.sc.edu/hansont/stat506/tukey.R") > tukeys.add.test(d$reduction,d$age,d$fat) # accept additive model ok Tukey's one df F test for Additivity data: d$age and d$fat on d$reduction F = 6.4453, num df = 1, denom df = 7, p-value = 0.03873 sample estimates: D estimate 0.2723121 (b) Tukey's test rejects additivity with a p-value of 0.04. We reject additivity at the 5% level. We will fit the additive model anyway. > f=lm(reduction~fat+age) > Anova(f,type=3) Anova Table (Type III tests) Response: reduction Sum Sq Df F value Pr(>F) (Intercept) 10.6850 1 4424.45 2.904e-12 *** fat 1.3203 2 273.35 4.326e-08 *** age 1.4190 4 146.89 1.610e-07 *** Residuals 0.0193 8 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > pairwise(f,fat) Pairwise comparisons ( hsd ) of fat estimate signif diff lower upper * 1 - 2 0.118 0.08881091 0.02918909 0.2068109 * 1 - 3 0.680 0.08881091 0.59118909 0.7688109 * 2 - 3 0.562 0.08881091 0.47318909 0.6508109 > lines(pairwise(f,fat)) 3 -0.414 | 2 0.148 | 1 0.266 | (c) There are significant differences in lipid reduction across the three diets. Specifically, within an age group, "extremely low" sig. reduces lipids 0.12 grams/liter relative to "fairly low" and sig. reduces lipids 0.68 grams/liter relative to "mod. low". "fairly low" sig. reduces lipids 0.56 grams/liter relative to "mod. low". > par(mfrow=c(2,2)) > plot(f) (d) The residuals vs. fitted values shows strong curvature, probably because there is an interaction between age group and diet. A possible remedy is to acutally fit Tukey's 1-df model and base inference on that, but we did not talk about this in class. The NPP also appears fairly nonlinear. A Box-Cox procedure shows a transformation of about lambda=0.7, and the CI does not include 1; this power transformation might also help. There are nonparametric methods for RCBD, but we didn't talk about them in class; they are available in the "coin" package for R.