############################################ # First problem ############################################ I provided R code: > learning=c(25,29,14,11,11,6,22,18,17,20,5,2) > school=factor(c(rep("Atlanta",4),rep("Chicago",4),rep("San Francisco",4))) > instructor=factor(c(1,1,2,2,1,1,2,2,1,1,2,2)) > f=lm(learning~school+school/instructor) > Anova(f,type=3) Anova Table (Type III tests) Response: learning Sum Sq Df F value Pr(>F) (Intercept) 2700.0 1 385.714 1.13e-06 *** school 156.5 2 11.179 0.009473 ** school:instructor 567.5 3 27.024 0.000697 *** Residuals 42.0 6 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (a) With p=0.009 < 0.05 we reject that there are no school effects at the 5% level. Similarly, 0.0007 < 0.05 indicates that there are significant instructor effects as well. > lsmeans(f,"school") # mean learning averaged over instructors NOTE: Results may be misleading due to involvement in interactions school lsmean SE df lower.CL upper.CL Atlanta 19.75 1.322876 6 16.51304 22.98696 Chicago 14.25 1.322876 6 11.01304 17.48696 San Francisco 11.00 1.322876 6 7.76304 14.23696 Results are averaged over the levels of: instructor Confidence level used: 0.95 > pairs(lsmeans(f,"school")) # which school(s) best? NOTE: Results may be misleading due to involvement in interactions contrast estimate SE df t.ratio p.value Atlanta - Chicago 5.50 1.870829 6 2.940 0.0586 Atlanta - San Francisco 8.75 1.870829 6 4.677 0.0081 Chicago - San Francisco 3.25 1.870829 6 1.737 0.2677 Results are averaged over the levels of: instructor P value adjustment: tukey method for comparing a family of 3 estimates (b) Atlanta is best, followed by Chicago and San Francisco; there is no significant difference between Chicago and San Francisco using Tukey's HSD. Atlanta Chicago San Francisco ------- --------------------- > par(mfrow=c(2,2)) > plot(f) (c) The residuals vs. fitted shows constant variance seems okay. NPP is reasonably straight so normality seems okay too. ############################################ # Second problem ############################################ > par(mfrow=c(1,1)) > percent=c(28,26,31,27,35,34,29,25,31,29,31,25,27,29,28) > color=factor(c(rep("blue",5),rep("green",5),rep("orange",5))) > spaces=c(300,381,226,350,100,153,334,473,264,325,144,359,296,243,252) > plot(percent~spaces,pch=19,col=c("blue","green","orange")[color]) (a) The response "percent" decreases as the number of parking spaces increases. Green appears to yield higher responses than blue overall at a fixed number of parking spaces; blue gives higher percent for a fixed number of parking spaces. > f=lm(percent~color) > Anova(f,type=3) Anova Table (Type III tests) Response: percent Sum Sq Df F value Pr(>F) (Intercept) 12615.0 1 1300.5155 1.323e-13 *** color 7.6 2 0.3918 0.6842 Residuals 116.4 12 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (b) Color is not significant in a model without the number of spaces. > f=lm(percent~color+spaces) > Anova(f,type=3) Anova Table (Type III tests) Response: percent Sum Sq Df F value Pr(>F) (Intercept) 2075.16 1 17343.097 < 2.2e-16 *** color 23.39 2 97.748 9.900e-08 *** spaces 115.08 1 961.809 4.645e-12 *** Residuals 1.32 11 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (c) Percent significantly changes with the number of parking spaces (p<0.0001) and the color of the flyer (p<0.0001). > lsmeans(f,"color") color lsmean SE df lower.CL upper.CL blue 29.14361 0.1549162 11 28.80264 29.48458 green 30.48842 0.1573256 11 30.14215 30.83470 orange 27.36797 0.1560321 11 27.02454 27.71139 Confidence level used: 0.95 > pairs(lsmeans(f,"color")) contrast estimate SE df t.ratio p.value blue - green -1.344816 0.2218649 11 -6.061 0.0002 blue - orange 1.775643 0.2191075 11 8.104 <.0001 green - orange 3.120459 0.2241985 11 13.918 <.0001 P value adjustment: tukey method for comparing a family of 3 estimates (d) Green is best, followed by blue, then orange. Using Tukey's HSD the three colors provide significantly different response percentages. green blue orange ----- ---- ------ > par(mfrow=c(2,2)) > plot(f) (d) Constant variance and normality seem okay. ############################################ # Third problem ############################################ > par(mfrow=c(1,1)) > sales=c(956,953,938,1049,1008,1032,1025,1123,350,352,338,438,412,449,385,532, + 769,766,739,859,880,875,860,915,176,185,168,280,209,223,217,301) > display=factor(c(rep(1,16),rep(2,16))) > time=factor(rep(1:4,8)) > store=factor(rep(1:8,each=4)) > d=data.frame(sales,display,time,store) > with(d,interactplot(time,display,sales,confidence=0.95)) # parallel? (a) There is a *lot* of variability; there does not seem to be an interaction between display and time (the lines are parallel) and there also doesn't seem to be a significant display effect. > f=lmer(sales~display*time+(1|store)) > Anova(f,type=3) Analysis of Deviance Table (Type III Wald chisquare tests) Response: sales Chisq Df Pr(>Chisq) (Intercept) 24.3759 1 7.925e-07 *** display 0.5315 1 0.4660 time 250.9089 3 < 2.2e-16 *** display:time 3.2498 3 0.3547 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (b) The interaction display:time is not significant. > f=lmer(sales~display+time+(1|store)) # additive model > Anova(f,type=3) # display significant? time significant? Analysis of Deviance Table (Type III Wald chisquare tests) Response: sales Chisq Df Pr(>Chisq) (Intercept) 24.3759 1 7.925e-07 *** display 0.5315 1 0.466 time 247.9595 3 < 2.2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (c) The display does not significantly affect the sales, however time is significant. > lsmeans(f,"time") time lsmean SE df lower.CL upper.CL 1 595.000 125.1642 6.02 288.925 901.075 2 604.375 125.1642 6.02 298.300 910.450 3 583.750 125.1642 6.02 277.675 889.825 4 687.125 125.1642 6.02 381.050 993.200 Results are averaged over the levels of: display Confidence level used: 0.95 > pairs(lsmeans(f,"time")) contrast estimate SE df t.ratio p.value 1 - 2 -9.375 7.33215 21 -1.279 0.5860 1 - 3 11.250 7.33215 21 1.534 0.4359 1 - 4 -92.125 7.33215 21 -12.565 <.0001 2 - 3 20.625 7.33215 21 2.813 0.0474 2 - 4 -82.750 7.33215 21 -11.286 <.0001 3 - 4 -103.375 7.33215 21 -14.099 <.0001 Results are averaged over the levels of: display P value adjustment: tukey method for comparing a family of 4 estimates (d) Only time is significant. Period 4 is best (most sales), followed by 2, 1, then 3. 4 2 1 3 - --- --- Significantly more sales during period 4 than periods 1, 2, or 3. No sig. difference between periods 1 and 2 or 2 and 3. > exactRLRT(f) # tests H0: sigma_rho=0 simulated finite sample distribution of RLRT. (p-value based on 10000 simulated values) data: RLRT = 122.2407, p-value < 2.2e-16 (e) With p zero to 15 decimal places we reject H0: sigma_rho=0 at the 5% level. There is significant store-to-store variability in sales. > plot(f) > qqnorm((ranef(f)$store[,1])) # NPP of estimated rho_i (f) Residuals vs. fitted show (slightly) increasing variance. NPP of estimated rho_i seems to indicate non-normality.