STAT 516 hw 4

Author

Karl Gregory

Download this data set and store it in the folder containing the .qmd file for your homework assignment.

The data set contains the self-reported heights (in feet and inches), lengths of index and pinky fingers (in millimeters), shoe size, and shoe size category (“m”/“w”) of several students. The code below imports the data into R and converts one “uk” shoe size to a “w” size according to a sizing chart found online. In addition the heights are converted to centimeter heights and a data frame hg is created containing a version of the data ready for analysis.

# import the data
hg0 <- read.table("heights.csv",sep=",",header=T)

# clean one data point
# https://www.grivetoutdoors.com/pages/shoe-size-chart
hg0$shoe[hg0$shoe_wm == 'uk'] <- 9 
hg0$shoe_wm[hg0$shoe_wm == 'uk'] <- 'w'

# create data frame for analysis
hg <- data.frame(height = (hg0$ft*12 + hg0$in.)*2.54, # get heights in cm
                 ind_mm = hg0$ind_mm,
                 pnk_mm = hg0$pnk_mm,
                 shoe = hg0$shoe,
                 shoe_wm = hg0$shoe_wm)

# view the first few rows of the data frame
head(hg)

It is of interest to use the multiple linear regression model to predict the height of a person based on his or her index and pinky finger lengths, shoe size, and shoe size gender “m” or “w”.

1.

Consider the multiple linear regression model which uses all the covariates—the index and pinky finger lengths, shoe size, and shoe size gender—to predict the height of a person.

1.a

Give the critical value for the overall F test at significance level α=0.05. This the value such that when it is exceeded by the value of the test statistic Ftest we reject H0 at α=0.05.

1.b

Fit the model and give the value of the test statistic for the overall F test of significance as well as the p-value associated with it. Give an interpretation of these values.

1.c

Report the variance inflation factor for each of the four covariates.

2.

Fit a model with all covariates except the shoe size gender covariate.

2.a

Use the full-reduced model F test to test whether the shoe size gender covariate has a nonzero regression coefficient in the full model. Give the value of the test statistic as well as the p value. Interpret the result of your test.

2.b

Obtain the value of the test statistic Ttest and the p value associated with it for testing H0: βj=0 versus H1: βj0 where j is the index of the shoe size gender covariate.

2.c

Explain the relationship between the value of the test statistic Ftest of the full-reduced model F test when one considers the removal of a single covariate with the test statistic Ttest for testing whether a single covariate is significantly related to the response.

3.

Fit a model using only the two shoe size covariates.

3.a

Use the full-reduced model F test to test whether either of the finger length variables has a nonzero regression coefficient in the full model. Give the value of the test statistic as well as the p value. Interpret the result of your test.

3.b

Compute the variance inflation factors of the two variables in this model. Comment on how these compare to their counterparts in the model with all four covariates.

3.c

Comment on anything else interesting about these two variance inflation factors!

4.

Fit a model using only the index and pinky finger lengths.

4.a

Give the value of the test statistic and the p value for the full-reduced model F test for testing whether either of the two shoe size covariates has a nonzero regression coefficient in the full model. Interpret the result.

4.b

Suppose the index and pinky finger measurements were recorded in centimeters instead of millimeters. Describe the effect this would have on the value of the test statistic Ftest in the previous part as well as on the p value.

5.

5.a

Use Mallow’s Cp statistic to select the best model among all possible submodels involving the four predictors.

5.b

Give the model chosen by backward selection based on the AIC criterion.

5.c

Give the model chosen by forward selection based on the AIC criterion.

6.

Considering your “best” model, check whether there are any outlying observations.