STAT 704 -- POST-TEST 2 REVIEW SHEET Pre-I. More Tools for Regression F. Diagnostic Measures 1. Added-variable (Partial Regression) Plots 2. Outliers and Influential Cases a. (Internally) Studentized Residuals b. Leverage Values (Hat diagonal elements) c. Cook's Distance d. DFFITS e. The various rules of thumb 3. What to do about Outliers/Influential Points I. Advanced Remedial Measures in Linear Regression A. Weighted Least Squares 1. For what violation is it a remedy? 2. Estimating the variance of each error term epsilon_i 3. Idea behind determining weights 4. Effect of WLS on model and inferences B. Ridge Regression 1. For what violation is it a remedy? 2. Idea behind biasing constant 3. Effect on inferences 4. Connection between ridge regression and LASSO regression C. Robust Regression 1. For what violation(s) is it a remedy? 2. Idea of M-estimation in regression 3. LAR (L1) regression 4. Huber's method 5. Effect on inferences II. Nonlinear Regression A. Main Definition of a Nonlinear Regression Model 1. When would we use nonlinear regression? 2. Intrinsically linear vs. intrinsically nonlinear mean response functions B. Parameter Estimation in Nonlinear Regression 1. Numerical Optimization "Search" Methods 2. Common search methods 3. Basic Idea behind Gauss-Newton Method a. Role of initial estimates b. Role of Taylor series approximation c. Updating parameter estimates d. When does the algorithm stop? e. Possible concerns about search algorithms C. Common Nonlinear Models 1. Role/meaning of the different parameters in the models 2. Example graphs of the mean response function 3. How we inituitively pick reasonable intial values for parameters D. Inference about Parameters 1. Justification for large-sample CIs 2. Hougaard's statistic and its purpose E. Use of Residual Plots in Nonlinear Regression III. Other Regression Models A. Generalized Linear Models (GLMs) 1. Three characteristics of a GLM a. Exponential Family b. Linear predictor c. Link function 2. Classical normal linear regression model as a GLM B. Logistic Regression 1. Binary response variable 2. E(Y) representing P(Y = 1) 3. Problems with Using standard linear model for binary Y 4. Logistic Mean Response Function a. Shape of function and role of beta_1 b. Model in terms odds (and log-odds) that Y = 1 c. Logistic regression model as a GLM d. Other possible link functions 5. Parameter Estimation for Logistic Regression a. Interpretation of b_1 and exp(b_1) b. Odds ratio and estimated odds ratio c. Extension to Multiple Logistic Regression 6. Inferences in Logistic Regression a. LR test about all betas b. Wald test about a single beta_j c. CI for beta_j or for associated odds ratio 7. Model Selection 8. Hosmer-Lemeshow Goodness-of-Fit Test 9. Pearson Residuals and Outlier Diagnostics 10. CI for "Mean Response" pi_h a. Point estimate b. Interpretation of CI 11. Prediction of Binary Y_h for a New Observation C. Poisson Regression 1. Count response variable 2. Poisson regression model as a GLM 3. Poisson regression model with ln link 4. Interpretion of b_1 in terms of exp(b_1) 5. Inference about regression parameters 6. Goodness of fit: residual deviance and Pearson X^2 7. Deviance residuals and Pearson residuals 8. Predicted mean response values and CI for mu-hat_i *** Note: Only the very basic ideas about nonparametric regression *** might be on the final exam, not the specific details D. Nonparametric Regression 1. Model for nonparametric regression 2. Advantages/disadvantages of nonparametric regression approach 3. Kernel Regression Estimators a. Idea of local averaging within a window b. Weighted averaging through use of kernel c. Idea of bandwidth and its effect on estimated curve 4. Spline Methods a. Definition of a Spline b. Cubic regression splines c. Choice of number/placement of knots and effect on estimated curve d. Smoothing splines and penalized SSE criterion e. Choice of lambda and effect on estimated curve