/* This example shows the analysis for factorial experiment with unbalanced data*/ /* using the Table 11.2 (p. 514) [in the updated edition of the book, this is */ /* now Table 11.3 on page 585] data example we looked at in class */ /* Entering the data and defining the variables: */ data table112; input A C Y; cards; 1 1 4 1 1 5 1 1 6 1 2 8 2 1 5 2 2 7 2 2 9 ; run; /* We will use PROC GLM to get the standard ANOVA table: */ /* The factors are A and C, which we specify in the CLASS statement. */ /* To match the book's output on page 517, we include the interaction A*C. */ /* As it turns out, the interaction is not significant here and can be ignored. */ PROC GLM data=table112; CLASS A C; MODEL Y = A C A*C; *MEANS A C; /* <-- This is wrong here! */ LSMEANS A C / STDERR; run; /* With UNBALANCED data, the MEANS statement calculates the estimates improperly. */ /* The MEANS statement does not account for the different numbers of observations per cell. */ /* That is why it is "commented out" above: to show you what NOT to do. */ /* On the other hand, the LSMEANS statement does it correctly. */ /* Using the LSMEANS output, we can see that the proper estimate */ /* of (alpha_1 - alpha_2) is (6.5 - 6.5) = 0. */ /* The STDERR option gives standard errors for these estimates. */ /* Note also that for unbalanced data in the two-way ANOVA, the Type I SS and Type III SS */ /* are NOT the same. In this case, we should look at the Type III SS. This correctly */ /* gives a SSA of zero for this example, since we have seen in class that the proper */ /* conclusion is that there is zero sample variation between the means of the levels */ /* of factor A. */ /* Just to show that this is exactly the same as using dummy variables: */ data table112; input A C Y; if A=1 then dummy1A=1; else dummy1A=-1; if C=1 then dummy1C=1; else dummy1C=-1; dummy1Adummy1C = dummy1A*dummy1C; cards; 1 1 4 1 1 5 1 1 6 1 2 8 2 1 5 2 2 7 2 2 9 ; run; PROC REG data=table112; MODEL Y = dummy1A dummy1C dummy1Adummy1C; test dummy1A=0; test dummy1C=0; test dummy1Adummy1C=0; run; /****************************************************************************/ /* IMPORTANT: What if one (or more) of the factors had more than 2 levels? */ /* We need (t-1) dummy variables to represent the t categories! */ /* When setting up the dummy variables, we use -1 for the LAST category. */ /* This implies the restriction that the tau_i values sum to zero. */ /* Example 1: Suppose factor A had three possible levels (1, 2, 3) instead of two. */ /* Suppose factor C still had two levels (1, 2). */ /* Adjusted data table (now factor A has three separate levels) */ data tableadjust; input A C Y; cards; 1 1 4 1 1 5 1 1 6 1 2 8 2 1 5 2 2 7 2 2 9 3 1 11 3 2 12 ; run; PROC GLM data=tableadjust; CLASS A C; MODEL Y = A C A*C; *MEANS A C; /* <-- This is wrong here! */ LSMEANS A C / STDERR; run; /* Dummy variable approach will give identical conclusions: */ data tableadjust; input A C Y; /* Need TWO dummy variables associated with factor A! */ /* Need only ONE dummy variable associated with factor C */ if A=1 then dummy1A=1; else if A=2 then dummy1A=0; else dummy1A=-1; if A=1 then dummy2A=0; else if A=2 then dummy2A=1; else dummy2A=-1; if C=1 then dummy1C=1; else dummy1C=-1; dummy1Adummy1C = dummy1A*dummy1C; dummy2Adummy1C = dummy2A*dummy1C; cards; 1 1 4 1 1 5 1 1 6 1 2 8 2 1 5 2 2 7 2 2 9 3 1 11 3 2 12 ; run; PROC REG data=tableadjust; MODEL Y = dummy1A dummy2A dummy1C dummy1Adummy1C dummy2Adummy1C; test dummy1Adummy1C=0, dummy2Adummy1C=0; /* tests about interaction effect */ test dummy1A=0, dummy2A=0; /* tests about factor A effect */ test dummy1C=0; /* tests about factor C effect */ run;