# import the data
hg0 <- read.table("heights.csv",sep=",",header=T)
# clean one data point
# https://www.grivetoutdoors.com/pages/shoe-size-chart
hg0$shoe[hg0$shoe_wm == 'uk'] <- 9
hg0$shoe_wm[hg0$shoe_wm == 'uk'] <- 'w'
# create data frame for analysis
hg <- data.frame(height = (hg0$ft*12 + hg0$in.)*2.54, # get heights in cm
ind_mm = hg0$ind_mm,
pnk_mm = hg0$pnk_mm,
shoe = hg0$shoe,
shoe_wm = hg0$shoe_wm)
# view the first few rows of the data frame
head(hg)STAT 516 hw 3
Download this data set and store it in the folder containing the .qmd file for your homework assignment. The data set contains the self-reported heights (in feet and inches), lengths of index and pinky fingers (in millimeters), shoe size, and shoe size gender (“m”/“w”) of several students.
The code below imports the data into R and converts one “uk” shoe size to a “w” size according to a sizing chart found online. In addition it converts the heights to centimeter heights and creates a data frame hg. Run this code to get started.
It is of interest to use the multiple linear regression model to predict the height of a person based on his or her index and pinky finger lengths, shoe size, and shoe size gender.
1.
Make a figure which shows scatterplots for all pairs of variables in the data set. Comment on which pairs of variables appear to be highly correlated.
2.
Fit a multiple linear regression model for predicting height based on all the other variables in the data set—index and pinky finger length, shoe size, and shoe size gender. Then:
2.a
Report the estimated value of the regression coefficient for each covariate.
2.b
Give the value of the estimated standard error
2.c
Give the value of the test statistic
2.d
Give the p value for testing
2.e
Give an interpretation to the estimated coefficient
2.f
Give an interpretation to the estimated coefficient
2.g
Do the index and pinky finger lengths appear to be important predictors of height?
2.h
Give an estimate of
2.i
Produce a normal quantile-quantile plot of the residuals as well as a residuals versus fitted values plot. Comment on whether you believe the assumptions of the multiple linear regression model to be satisfied.
3.
Fit a simple linear regression model using only the shoe size gender covariate. Then:
3.a
Give an interpretation of the estimated regression coefficient for the shoe size gender covariate.
3.b
Why does this covariate appear to have a different effect when it is the sole covariate in the model?
4.
Fit a multiple linear regression model using only the index and pinky finger lengths as predictors of height.
4.a
Does either covariate in this model appear to be significantly related to the height?
4.b
What proportion of the total variation in heights does this model explain?
5.
Fit a multiple linear regression model using only the shoe size and shoe size gender covariates.
5.a
Does either covariate in this model appear to be significantly related to the height?
5.b
What proportion of the total variation in heights does this model explain?
6.
A forensic team analyzes a shoe print and a hand print, presumably left by the same person: The shoe print belongs to a size
6.a
If the forensic team uses the models fitted above to make guesses about the height of the person who left the prints, will they be extrapolating beyond the range of the observed data? Explain your answer.
6.b
Give an interval such that the forensic team can be
6.c
Give an interval such that the forensic team can be
7.
Suppose there is no shoe print, but only a hand print with index and pinky fingers measuring
7.a
Give an interval such that the forensic team can be
7.b
Give an interval such that the forensic team can be
8.
Suppose there is no hand print, but only a shoe print belonging to a size 8 women’s shoe:
8.a
Give an interval such that the forensic team can be
8.b
Give an interval such that the forensic team can be
9.
Answer the following based on careful study of the preceding model output and confidence and prediction intervals:
9.a
If a shoe print is found, does a hand print provide useful additional accuracy in guessing the height of the person leaving the prints?
9.b
If a hand print is found, does a shoe print provide useful additional accuracy in guessing the height of the person leaving the prints?
9.c
If only a hand print is found, should the forensic team bother trying to use the index and pinky finger lengths to guess the height of the person who left it?