# import the data
hg0 <- read.table("heights.csv",sep=",",header=T)
# clean one data point
# https://www.grivetoutdoors.com/pages/shoe-size-chart
hg0$shoe[hg0$shoe_wm == 'uk'] <- 9
hg0$shoe_wm[hg0$shoe_wm == 'uk'] <- 'w'
# create data frame for analysis
hg <- data.frame(height = (hg0$ft*12 + hg0$in.)*2.54, # get heights in cm
ind_mm = hg0$ind_mm,
pnk_mm = hg0$pnk_mm,
shoe = hg0$shoe,
shoe_wm = hg0$shoe_wm)
# view the first few rows of the data frame
head(hg)STAT 516 hw 4
Download this data set and store it in the folder containing the .qmd file for your homework assignment.
The data set contains the self-reported heights (in feet and inches), lengths of index and pinky fingers (in millimeters), shoe size, and shoe size category (“m”/“w”) of several students. The code below imports the data into R and converts one “uk” shoe size to a “w” size according to a sizing chart found online. In addition the heights are converted to centimeter heights and a data frame hg is created containing a version of the data ready for analysis.
It is of interest to use the multiple linear regression model to predict the height of a person based on his or her index and pinky finger lengths, shoe size, and shoe size gender “m” or “w”.
1.
Consider the multiple linear regression model which uses all the covariates—the index and pinky finger lengths, shoe size, and shoe size gender—to predict the height of a person.
1.a
Give the critical value for the overall F test at significance level
1.b
Fit the model and give the value of the test statistic for the overall F test of significance as well as the p-value associated with it. Give an interpretation of these values.
1.c
Report the variance inflation factor for each of the four covariates.
2.
Fit a model with all covariates except the shoe size gender covariate.
2.a
Use the full-reduced model F test to test whether the shoe size gender covariate has a nonzero regression coefficient in the full model. Give the value of the test statistic as well as the p value. Interpret the result of your test.
2.b
Obtain the value of the test statistic
2.c
Explain the relationship between the value of the test statistic
3.
Fit a model using only the two shoe size covariates.
3.a
Use the full-reduced model F test to test whether either of the finger length variables has a nonzero regression coefficient in the full model. Give the value of the test statistic as well as the p value. Interpret the result of your test.
3.b
Compute the variance inflation factors of the two variables in this model. Comment on how these compare to their counterparts in the model with all four covariates.
3.c
Comment on anything else interesting about these two variance inflation factors!
4.
Fit a model using only the index and pinky finger lengths.
4.a
Give the value of the test statistic and the p value for the full-reduced model F test for testing whether either of the two shoe size covariates has a nonzero regression coefficient in the full model. Interpret the result.
4.b
Suppose the index and pinky finger measurements were recorded in centimeters instead of millimeters. Describe the effect this would have on the value of the test statistic
5.
5.a
Use Mallow’s
5.b
Give the model chosen by backward selection based on the AIC criterion.
5.c
Give the model chosen by forward selection based on the AIC criterion.
6.
Considering your “best” model, check whether there are any outlying observations.