# height (cm)
Y <- c(162.56, 162.56, 172.72, 165.10, 167.64,
193.04, 172.72, 177.80, 185.42, 165.10,
175.26, 170.18, 172.72, 203.20, 167.64,
168.91, 157.48, 180.34, 160.02, 187.96,
187.96, 180.34, 160.02, 193.04)
# length of index finger (mm)
x <- c(75, 70, 70, 68, 71, 78, 80, 73, 68, 67,
75, 75, 74, 88, 80, 70, 70, 74, 70, 81,
77, 78, 60, 80)
# sample size
n <- length(Y)STAT 516 hw 2
Twenty-four students recorded their heights and the lengths of their index fingers. The measurements, in centimeters and millimeters, respectively, are read into R using the code below:
It is of interest to use the simple linear regression model to predict the height of a person based on the length of his or her index finger.
1)
Make a scatterplot of the heights versus the index finger lengths with the least-squares line overlaid. Report the intercept and slope of the least-squares line as well as the value of Pearson’s correlation coefficient.
plot(Y ~ x,
xlab = "Length of index finger (mm)",
ylab = "Height (cm)")
rxY <- cor(x,Y)
xbar <- mean(x)
Ybar <- mean(Y)
b1 <- rxY * sd(Y) / sd(x)
b0 <- Ybar - b1 * xbar
abline(b0,b1)Intercept: 68.13919
Slope: 1.441529
Pearson's correlation coefficient: 0.7010398
2)
Based on the fitted simple linear regression model, to what difference in height does an additional millimeter of index finger length correspond?
This is an interpretation of the estimated slope:
Every additional mm in index finger length corresponds to an
additional 1.441529 cm in height.
3)
Give an estimate of the error term variance.
Yhat <- b0 + b1 * x
ehat <- Y - Yhat
sigma_hat <- sqrt(sum(ehat**2)/(n-2))Estimate of the error term variance: 17.72194
4)
Make a normal quantile-quantile plot of the residuals as well as a residuals versus fitted values plot. Then carefully explain whether you believe the assumptions of the simple linear regression model are satisfied for these data.
plot(lm(Y~x),which = 1)plot(lm(Y~x),which = 2)The normal Q-Q plot is indicative of normality; the
residuals versus fitted values plot shows fairly constant spread from
left to right, so the variance appears to be roughly constant. There is
perhaps some suggestion of a pattern in the points, which may indicate
non-linearity, but it is quite weak. It may be safe to assume that the
assumptions of the linear regression model are satisfied.
5)
Give a
alpha <- 0.01
ta2 <- qt(1 - alpha/2,n-2)
Sxx <- sum((x - xbar)**2)
se <- sigma_hat/sqrt(Sxx)
lo <- b1 - ta2 * se
up <- b1 + ta2 * se99% confidence interval for slope: (0.560296,2.322762)
6)
State whether you would reject
We would reject H0: Note that a 95% confidence
interval for b1 would be narrower than the 99% CI, while being
centered at the same value. Since the 99% CI did not
contain 0, neither then will the 95% CI, implying that we would
reject H0 at the 0.05 significance level.
7)
Give an estimate of the mean height of persons with index finger length equal to
alpha <- 0.05
ta2 <- qt(1 - alpha/2,n-2)
xnew <- 72
se <- sigma_hat * sqrt(1/n + (xnew - xbar)**2/Sxx)
Ynewhat <- b0 + b1 * xnew
lo <- Ynewhat - ta2*se
up <- Ynewhat + ta2*seEstimated mean height of persons with index finger length
equal to 72 mm: 171.9293
95% confidence interval: (167.9944,175.8642)
8)
A hand print is found made by a hand with an index finger length of
alpha <- 0.05
ta2 <- qt(1 - alpha/2,n-2)
xnew <- 72
se <- sigma_hat * sqrt(1 + 1/n + (xnew - xbar)**2/Sxx)
Ynewhat <- b0 + b1 * xnew
lo <- Ynewhat - ta2*se
up <- Ynewhat + ta2*se95% prediction interval: (153.1362,190.7224)
9)
Do you think the above interval would be useful in identifying the person who made the print?
This interval contains most (87.5%) of the heights of the
students in the class; if a similar percentage of
persons in the general population have heights in this interval,
then the interval does not narrow down very much the number
of persons who could possibly have made the hand print
10)
Give the value of the coefficient of determination for these data. Interpret the value.
SStot <- sum((Y - Ybar)**2)
SSreg <- sum((Yhat - Ybar)**2)
Rsq <- SSreg/SStot Coefficient of determination (R^2): 0.4914568
11)
Give the value of the test statistic
# T test statistic
Ttest <- b1 / (sigma_hat/sqrt(Sxx))
# F test statistic
SSerror <- sum((Y - Yhat)**2)
MSerror <- SSerror/ (n-2)
MSreg <- SSreg / 1
Ftest <- MSreg / MSerror # equal to Ttest**2 in SLRValue of T test statistic: 4.610947
Value of F test statistic: 21.26083
12)
Give the p-value for testing
pval <- 1 - pt(Ttest,n-2)The p-value is the area to the right of
the test statistic under the pdf of the t distribution
with n-2 degrees of freedom. This is: 6.78375e-05.
There is strong evidence of a positive linear relationship
between length of index finger and height.
13)
Comment on whether there are any outliers in the data set. Show a plot to support your answer.
plot(lm(Y~x),which = 4)There are a few observations with somewhat large Cook's
distances, but they do not appear to be very extreme outliers.