/* SAS Analysis of a Two-Factor Study */ /* We use the Castle Bakery data found in Chapter 19 */ /* and used for the example from class. */ data bakery; input sales height width store; cards; 47 1 1 1 43 1 1 2 46 1 2 1 40 1 2 2 62 2 1 1 68 2 1 2 67 2 2 1 71 2 2 2 41 3 1 1 39 3 1 2 42 3 2 1 46 3 2 2 ; run; PROC GLM data=bakery; CLASS height width; MODEL sales = height width height*width; LSMEANS height width height*width; OUTPUT out=pred p=ybar r=resid; run; /* The CLASS statement tells SAS that height and width are both factors. */ /* The LSMEANS statement here produces the sample mean sales for each level */ /* of height, and for each level of width, and for each height-width combination. */ /* The ANOVA table is printed out in the PROC GLM output. The first thing we do */ /* is to test for significant interaction. The SAS output shows the P-value for */ /* this test is 0.3747, so at a 0.05 significance level, we have no significant */ /* interaction between height and width. */ /* Since there are no significant interaction effects, we may test for the */ /* effects of the height factor and of the weight factor directly. At alpha=.05, */ /* there is a significant effect on sales due to display height (P-value < .0001) */ /* implying that the mean sales are significantly different at the various levels */ /* of height. However, there is no significant effect on sales due to width */ /* (P-value = 0.3226). */ /* Note that SAS gives both Type I SS and Type III SS. Since these data are */ /* balanced (same number of observations in each "cell"), these outputs are the */ /* same. We will later look at an example of unbalanced data. */ /***********************************************************************************/ /* Interaction plots for the Castle Bakery data: */ /* These use the "pred" data set created in the OUTPUT statement of PROC GLM above */ /* We can plot sales against height for each value of width: */ symbol1 i = join v=circle l=32 c = black; symbol2 i = join v=star l=32 c = black; PROC GPLOT data=pred; PLOT ybar*height = width; run; /* or we can plot sales against width for each value of height: */ symbol1 i = join v=circle l=32 c = black; symbol2 i = join v=star l=32 c = black; symbol3 i = join v=plus l=32 c = black; PROC GPLOT data=pred; PLOT ybar*width = height; run; /* A bit nicer-looking plots */ PROC SGPLOT data=pred; SERIES X=HEIGHT Y=YBAR / GROUP=WIDTH; RUN; /* Or */ PROC SGPLOT data=pred; SERIES X=WIDTH Y=YBAR / GROUP=HEIGHT; RUN; /* These plots show graphically (not formally) that there is some interaction */ /* between height and width, but it is very mild. In fact, the formal */ /* hypothesis test for interaction reveals that the interaction is not significant. */ /***********************************************************************************/ /* Plots to Check Model Assumptions: */ goptions reset=all; /* The above line resets the graphical plotting options. */ symbol1 v=circle l=32 c = black; PROC GPLOT data=pred; PLOT resid*ybar/vref=0; run; PROC UNIVARIATE noprint ; QQPLOT resid / normal; run; /* This produces a residual plot (against fitted values) and a normal Q-Q plot */ /* of the residuals. We see no evidence of nonconstant error variance in the */ /* residual plot, but there may be some non-normality of errors, based on the */ /* Q-Q plot. */ /* These figures may be compared to those in Figure 19.10 (pg. 843) of the book. */ /***********************************************************************************/ /* Formal Tests of Model Assumptions */ /* For the Brown-Forsythe test when there are multiple factors, we must create an */ /* artificial "factor" whose levels are all the diistinct factor level combinations. */ /* For example, for the Castle Bakery data set: */ DATA new; SET bakery; if height=1 and width = 1 then heightwidth=11; if height=2 and width = 1 then heightwidth=21; if height=3 and width = 1 then heightwidth=31; if height=1 and width = 2 then heightwidth=12; if height=2 and width = 2 then heightwidth=22; if height=3 and width = 2 then heightwidth=32; run; PROC GLM data = new; CLASS heightwidth; MODEL sales = heightwidth; MEANS heightwidth / HOVTEST=BF; run; /* Note that SAS does NOT perform the Brown-Forsythe test in this example because */ /* there are FEWER THAN 3 observations in EACH treatment. However, code similar */ /* to this will perform the B-F test when there are enough observations in each cell. */ /*** And the Shapiro-Wilk test for normality is produced by: ****/ PROC UNIVARIATE DATA=pred normal; VAR resid; RUN; /* The Shapiro-Wilk P-value is 0.0433. We can formally reject the normality */ /* assumption. Possibly a transformation of the response variable is appropriate here. */