/* SAS example of multiple linear regression */ /* The data are form the portrait studio example discussed in class. */ /* people16 and income will be or predictor variables (X1 and X2). */ /* sales will be our response variable (Y). */ data studio; input people16 income sales; cards; 68.5 16.7 174.4 45.2 16.8 164.4 91.3 18.2 244.2 47.8 16.3 154.6 46.9 17.3 181.6 66.1 18.2 207.5 49.5 15.9 152.8 52.0 17.2 163.2 48.9 16.6 145.4 38.4 16.0 137.2 87.9 18.3 241.9 72.8 17.1 191.1 88.4 17.4 232.0 42.9 15.8 145.3 52.5 17.8 161.1 85.7 18.4 209.7 41.3 16.5 146.4 51.7 16.3 144.0 89.6 18.1 232.6 82.7 19.1 224.1 52.3 16.0 166.5 ; run; /* In the MODEL statement, the response variable goes on the */ /* left of the equals sign, and the predictors go on the right. */ /* The option "clb" produces 95% CIs for the individual betas. */ PROC REG DATA = studio; MODEL sales = people16 income / clb; run; /* NOTE: */ /* 95% is the default confidence level */ /* to specify another confidence level, use (e.g., for 90%) */ * MODEL sales = people16 income / clb alpha = 0.10; /* Find the results of the overall F-test in the SAS output. */ /* Also find the coefficient of multiple determination R^2. */ /* Find the 95% CIs for beta_1 and for beta_2 on the SAS output. */ /* Also the the p-value for the two-sided test for */ /* whether the each of the true coefficients is zero. */ /* Compare the results in the SAS output to the results in the */ /* book (Sec. 6.9) for the portrait studio example. */ /**********************************************************************************/ /* Interval Estimation of E(Y_h) */ /* Example 1: */ /* We want to estimate the mean sales */ /* for cities with 65.4 thousand people 16 or younger */ /* and per capita disposable income 17.6 thousand dollars, */ /* with a 95% CI. */ /* Let's add the value X1 = 65.4 and X2 = 17.6 to our data set: */ /* We'll leave a missing value (.) for sales. */ DATA Xvalues; input people16 income sales; CARDS; 65.4 17.6 . ; DATA studio; SET studio Xvalues; ; /* The option clm will give us CIs for the mean of Y, */ /* for the values of X1 and X2 in the data set. */ /* The option alpha=0.05 tells SAS we want a 95% CI. */ /* We don't actually need that option since 95% is */ /* the default coverage. */ PROC REG DATA = studio; MODEL sales = people16 income / clm alpha = 0.05; run; /* Look at the last line of the "Output Statistics". */ /* This shows the predictions and confidence intervals */ /* for the city with x1 = 65.4 and X2 = 17.6. */ /* We see the fitted value for this observation was */ /* 191.104 thousand dollars. */ /* The 95% CI for mean sales for all cities with */ /* 65.4 thousand people 16 or younger and per capita */ /* disposable income 17.6 thousand dollars */ /* is between 185.3 and 196.9 thousand dollars. */ /*********** Residual Plots ************/ /*** The following code will produce residual plots for this multiple regression ****/ PROC REG DATA=studio; MODEL sales = people16 income / P R; OUTPUT OUT=NEW P=PRED R=RES; PROC GPLOT DATA=NEW; PLOT RES*PRED='+'/ VREF=0; PROC UNIVARIATE noprint ; QQPLOT RES / normal; RUN; /**** The INSIGHT procedure produces somewhat nicer-looking plots *****/ /* For the Q-Q plot, choose "Residual Normal Q-Q" under the Graphs menu */ /* when the PROC INSIGHT output is showing **/ PROC INSIGHT; OPEN studio; FIT sales = people16 income; RUN;