STAT 516 -- EXAM 1 REVIEW SHEET I. Simple Linear Regression (SLR) A. Basic Ideas 1. What is a Model? 2. The Two Types of Variables in SLR 3. Statistical Relationship between Two Variables 4. Mathematical Form of the SLR Model a. Deterministic and Random Components b. Meanings/Interpretations for the Regression Coefficients (Intercept & Slope) c. Conditional Mean of Y 5. Four Assumptions for the SLR Model 6. Extrapolation 7. Understanding Scatter Plots B. Estimating Parameters 1. Estimated Slope and Intercept a. Formulas in SLR b. Proper interpretations of their values 2. Idea behind Least Squares Estimation 3. Estimating sigma^2 with MSE C. Partitioning the Total Sum of Squares into SSR and SSE 1. Computational Formulas for TSS, SSR, and SSE 2. Reasoning behind the ANOVA test for beta_1 3. Test statistic & procedure for the F-test for the slope beta_1 D. Other Inference 1. Sampling distribution of the estimate beta_1-hat 2. Test statistic & procedure for the t-test for the slope beta_1 3. What does this test tell us about the relationship between Y and X? 4. Some advantages of t-test over (equivalent) F-test in SLR 5. CI for the true slope beta_1 6. Inference about the Response Variable a. CI for the Mean Response at a particular X-value b. PI for the Response of a New Observation at a particular X-value c. How are these two intervals different? d. Which should be wider? E. Correlation 1. Mathematical Definition of Correlation Coefficient r 2. Properly Interpreting a value of r 3. Mathematical Definition of Coefficient of Determination r^2 4. Properly Interpreting a value of r^2 5. Relationship between r^2 and the F statistic (see p. 354 and p. 408) F. Regression Diagnostics (also applies to MLR) 1. What is the residual for each data point? 2. Residual Plots a. What is Residual Plot vs. Fitted Values? b. Using it to check for violations of the four model assumptions c. Using normal Q-Q plot of the residuals 3. Remedies for Violations of Assumptions a. Transformations of the Variable(s) b. Interpretations/Predictions in terms of TRANSFORMED variables II. Multiple Linear Regression (MLR) A. Basic Ideas 1. Mathematical Form of the MLR Model 2. Notation: m independent variables 3. Purposes of the MLR model (similar to purposes of SLR model) 4. Interpretation for (estimated) Intercept 5. Interpretation for (estimated) Partial Effect, for each indep. variable (be careful) B. Inference about the MLR Model 1. Overall F-test for the MLR Model a. What is it testing? b. Test Statistic & Procedure c. "Error degrees of freedom" and "Regression degrees of freedom" in MLR 2. T-tests for Individual Coefficients a. What are these testing? b. One- and Two-Tailed Tests c. Properly Interpreting Conclusions / SAS Output 3. Definition of MSE (as an estimate for sigma^2) in the MLR model 4. F-tests on Sets of Independent Variables a. What are the Hypotheses we are Testing Here? b. Idea of Comparing Reduced Model and Full Model c. Interpreting the Results of the TEST statement in SAS to do this sort of F-test 5. Inference about the Response Variable a. CI for the Mean Response at a particular set of X-values b. PI for the Response of a New Observation at a particular set of X-values c. How are these two intervals different? C. Other aspects of Multiple Regression 1. Coefficient of Determination R^2 and its interpretation in the MLR model 2. Special Regression Models a. Polynomial Models b. Multiplicative Model c. Transforming to Linearize these Models d. When should these models be used? 3. Multicollinearity a. What is multicollinearity? b. Common Problems Caused by Multicollinearity c. Detecting Multicollinearity with VIFs d. Possible Remedies for Multicollinearity 4. (Independent) Variable Selection a. Advantages/disadvantages to Having Many Independent Variables b. Advantages/disadvantages to Having Few Independent Variables c. C(p) criterion: what does this measure? d. Adjusted R-squared criterion: what does this measure? 5. Detecting Outliers/Influence Points a. Detecting Outliers in MLR using Studentized Residuals b. Detecting High-leverage Points in MLR using Hat Diagonal values c. Detecting Influence Points in MLR using DFFITS d. How are these concepts different? e. Various Rules of Thumb f. Dealing with Outliers/Influence Points