STAT 704 -- MIDTERM REVIEW SHEET I. Random Variables and Important Distributions A. Random Variables 1. Density Function of a r.v. 2. Expected Value of a r.v. & Properties 3. Variance of a r.v. & Properties 4. Standard Deviation of a r.v. 5. Covariance and Correlation 6. Independence (and how it's related to covariance) B. Linear Combinations of Random Variables 1. Expected Value of Linear Combination 2. Variance of Linear Combination 3. Variance of Linear Combination of Independent r.v.'s 4. Covariance of Linear Combinations of Independent r.v.'s C. Sample Mean Y-bar 1. Expected value and Variance of Y-bar 2. Central limit Theorem D. Normal Distribution 1. Properties of Normal r.v.'s 2. Linear Combinations of Normal r.v.'s E. Related Distributions 1. Chi-square distribution 2. t distribution 3. F distribution 4. Relationships among these Distributions II. One-Sample and Two-Sample Models A. Single-Sample Normal-Data Model 1. Sample Variance and its Distribution 2. Role of t distribution 3. CI for mu and proper interpretation 4. Hypothesis test (t-test) about mu a. Test statistic b. Alternative Hypotheses c. Rejection rules and P-value d. Proper conclusion for hypothesis test 5. Connection between CI and Two-Sided Tests B. Paired-Samples Normal-Data Model 1. Finding the differences 2. Connection to One-Sample Inference 3. Correct Interpretation of CI for Mean Difference C. Two-Independent-Samples Normal-Data Model 1. Equal-Variance Situation a. CI for mu_1 - mu_2 b. Test statistic and its distribution c. Alternative Hypotheses for test 2. Unequal-Variance Situation a. CI for mu_1 - mu_2 b. Test statistic c. Alternative Hypotheses for test 3. Correct Interpretation of CI for Difference of Means D. Applicability of t-procedures 1. Requirements for t-procedures 2. Robustness of t-procedures 3. Large-sample-size situation 4. Checking assumptions 5. Normal Q-Q plots (and other plots) E. Nonparametric Alternatives 1. Sign test 2. Wilcoxon Signed-Rank Test 3. Wilcoxon Rank-Sum Test 4. In what situation should each of these tests be used? 5. What are the data assumptions for each of these tests? III. Simple Linear Regression Model A. Basics of the SLR Model 1. Statistical Relationship between Y and X 2. Role of Response and Predictor in SLR Model 3. Mathematical Equation and Error Assumptions for SLR Model 4. Deterministic and Random Component 5. Mean Response and Variance of Response 6. Model Using Matrix Notation B. Estimation of beta_0 and beta_1 1. Idea behind Least Squares Method 2. The "Normal Equations" and the Least Squares Estimators b_0 and b_1 3. Properties of Least Squares Estimators C. Fitted Values and Residuals D. Interpreting Predicted Values and Estimated Slope E. Estimating the Error F. Normal Error Assumption 1. Why do we need to assume normality? G. Inference in SLR Model 1. Sampling distribution of estimated slope b_1 2. CI for true slope beta_1 3. t-test about true slope beta_1 4. What does this test tell us about the relationship between Y and X? H. Inference about the Response Variable 1. CI for the Mean Response at a particular X-value 2. PI for the Response of a New Observation at a particular X-value 3. How are these two intervals different? 4. Which should be wider? I. Analysis of Variance Approach 1. SSTO, SSR, and SSE 2. "Partitioning" the Sample Variation in Y 3. ANOVA table 4. Reasoning behind the F-test about beta_1 5. Test statistic & procedure for the F-test for the slope beta_1 6. Idea of Reduced and Full Models J. Measuring the Linear Relationship between Y and X 1. Definition of Coefficient of Determination R^2 2. Properly Interpreting a value of R^2 3. Definition of Correlation Coefficient r 4. Properly Interpreting a value of r IV. Miscellaneous Regression-Related Topics A. Correlation Models 1. Key Difference between regression model and correlation model 2. Bivariate normal model 3. Population correlation coefficient rho 4. Testing whether rho = 0 5. Large-sample CI for rho B. Cautions about Regression 1. Predicting Values into the Future 2. Extrapolation and its Associated Dangers 3. Does linear association between Y and X imply causation? 4. Concerns with simultaneous multiple predictions/inferences 5. Effect of Measurement Error in the X variable(s) V. Introduction to Multiple Linear Regression (MLR) A. General MLR model with k predictors 1. Interpretions of regression coefficients in the MLR model 2. Meaning and Examples of General Linear Model 3. General Linear Model in Matrix Terms a. Y vector b. X matrix c. beta vector d. epsilon vector 4. Fitting the MLR (estimating the beta's) a. vector of estimated coefficients b. vector of fitted values c. vector of residuals d. Interpretations of estimated regression coefficients B. Analysis of Variance in MLR 1. SSTO, SSR, SSE 2. Degrees of Freedom for each SS 3. Overall ANOVA F-test a. Null and alternative hypotheses b. Test statistic value 4. Coefficient of Multiple Determination R^2 5. Adjusted R^2 C. Inference about Individual Regression Coefficients 1. CI for an individual beta 2. Test for whether an individual beta = 0 a. Tests marginal effect of individual predictor b. "in the presence of" other predictors in the model c. Bonferroni and Holm corrections for multiple tests D. CI for the mean response, E(Y_h) E. Prediction Interval for 'new' response value, Y_h(new) F. Checking Model Assumptions through Residual Plots 1. What are the major regression model assumptions? 2. What values are plotted on the axes of a residual plot? 3. Checking for model misspecification 4. Checking for non-constant error variance 5. Checking for departures from normality a. Graphical methods b. Formal tests G. Transformations of Variables 1. Purpose of transformations a. Transformations of X variable(s) b. Transformations of Y c. Transformations of both 2. Which types of transformations alleviate which violations? 3. Reverse-transformations back to units of original variable(s) H. Extra SS and F-tests 1. Behavior of SSE as predictors are added to the model 2. Reduced model vs. Full model 3. Testing whether some (but not all) predictors can be dropped a. Null and alternative hypotheses b. Test statistic value VI. Advanced Considerations in Regression A. Multicollinearity 1. What is multicollinearity? 2. Common Problems Caused by Multicollinearity 3. Detecting Multicollinearity with VIFs 4. Possible Remedies for Multicollinearity B. Polynomial Regression 1. Determining whether polynomial regression is needed 2. Centering predictor variables 3. Polynomial regression with two predictors 4. Extrapolation in polynomial regression C. Interaction Models 1. Basic meaning of interaction between two predictors 2. Interaction plots 3. F-test for whether interactions are significant D. Model Building 1. Confirmatory vs. Exploratory Observational Studies 2. Forward Stepwise Regression Method 3. "All-possible-subsets" approach 4. Criteria for choosing "best" model a. Adjusted R^2 b. AIC c. BIC d. C_p criterion 5. Overall goals in model selection E. Model Validation 1. Data splitting (cross-validation) a. Training Set and Validation Set b. MSPR 2. n-fold cross-validation a. PRESS statistic and how it is used F. Diagnostic Measures 1. Added-variable (Partial Regression) Plots 2. Outliers and Influential Cases a. (Internally) Studentized Residuals b. Leverage Values (Hat diagonal elements) c. Cook's Distance d. DFFITS e. The various rules of thumb 3. What to do about Outliers/Influential Points