hg0 <- read.table(pathtofile,sep=",",header=T) # edit pathtofile
colnames(hg0) <- c("ft","in","ind","pnk","shoe","shoe_wm","class")
keep <- which(hg0$shoe_wm %in% c("m","w")) # remove a "uk" shoe size
hg <- hg0[keep,]
# create 0,1 response
hg$w <- ifelse(hg$shoe_wm == "w", 1, 0)
# convert heights to cm
hg$hgt <- (hg$ft*12 + hg$`in`)*2.54
# make STAT 515 and STAT 516 data frames
hg515 <- hg[hg$class == "STAT_515_sp_2026",]
hg516 <- hg[hg$class == "STAT_516_sp_2026",]STAT 516 hw 10
Students in STAT 515 and STAT 516 courses completed a survey in which they reported their heights (in feet and inches), index and pinky finger lengths (mm), shoe size, and shoe size gender (“w”/“m”). The data are recorded in this file.
Here we will consider using logistic regression to predict the reported shoe size gender (“w”/“m”) based on the other reported measurements. The R code below reads in and cleans the data, constructing a response variable equal to
1.
Plot the responses against the index finger lengths for the STAT 516 students.
2.
Fit a logistic regression model using the STAT 516 student responses for predicting the gender response “w” based on the index finger length. Report the slope coefficient and make a scatter plot of the responses against the index finger lengths with the fitted probabilities overlaid; overlay also the curve on which these fitted probabilities lie.
3.
Comment on the strength of evidence in the data of a relationship between gender and index finger length. Explain your answer and give a careful interpretation of the estimated slope coefficient.
4.
Make a plot of the ROC curve for classifying STAT 516 students as wearing “w” or “m” shoes with the fitted logistic regression model using index finger length as the predictor.
5.
Using the model fit on only the STAT 516 students, obtain predicted probabilities for the STAT 515 students. Then report the proportion of STAT 515 students correctly classified as “w” or “m” under the rule which assigns the class “w” when the predicted probability is at least
6.
Now fit a model on the STAT 516 students which uses height to predict the gender response “w”. Then, as in the previous part, obtain predicted probabilities based on this model for the STAT 515 students and report the proportion of these correctly classified as “w” or “m” under the rule which assigns the class “w” when the predicted probability is at least
7.
State whether you believe index finger length or height is a better predictor of gender. Explain your answer.
8.
Investigate whether one should include height, index finger, and pinky finger length all together in one model to predict gender. Report your findings.