STAT 516 hw 2

Author

Karl Gregory

Twenty-four students recorded their heights and the lengths of their index fingers. The measurements, in centimeters and millimeters, respectively, are read into R using the code below:

# height (cm)
Y <- c(162.56, 162.56, 172.72, 165.10, 167.64, 
       193.04, 172.72, 177.80, 185.42, 165.10, 
       175.26, 170.18, 172.72, 203.20, 167.64, 
       168.91, 157.48, 180.34, 160.02, 187.96, 
       187.96, 180.34, 160.02, 193.04)

# length of index finger (mm)
x <- c(75, 70, 70, 68, 71, 78, 80, 73, 68, 67, 
       75, 75, 74, 88, 80, 70, 70, 74, 70, 81, 
       77, 78, 60, 80)

# sample size
n <- length(Y)

It is of interest to use the simple linear regression model to predict the height of a person based on the length of his or her index finger.

1)

Make a scatterplot of the heights versus the index finger lengths with the least-squares line overlaid. Report the intercept and slope of the least-squares line as well as the value of Pearson’s correlation coefficient.

plot(Y ~ x,
     xlab = "Length of index finger (mm)",
     ylab = "Height (cm)")

rxY <- cor(x,Y)
xbar <- mean(x)
Ybar <- mean(Y)
b1 <- rxY * sd(Y) / sd(x)
b0 <- Ybar - b1 * xbar

abline(b0,b1)

Intercept: 68.13919
Slope: 1.441529
Pearson's correlation coefficient: 0.7010398 

2)

Based on the fitted simple linear regression model, to what difference in height does an additional millimeter of index finger length correspond?

This is an interpretation of the estimated slope: 
Every additional mm in index finger length corresponds to an 
additional 1.441529 cm in height.

3)

Give an estimate of the error term variance.

Yhat <- b0 + b1 * x
ehat <- Y - Yhat
sigma_hat <- sqrt(sum(ehat**2)/(n-2))
Estimate of the error term variance: 17.72194

4)

Make a normal quantile-quantile plot of the residuals as well as a residuals versus fitted values plot. Then carefully explain whether you believe the assumptions of the simple linear regression model are satisfied for these data.

plot(lm(Y~x),which = 1)

plot(lm(Y~x),which = 2)

The normal Q-Q plot is indicative of normality; the 
residuals versus fitted values plot shows fairly constant spread from 
left to right, so the variance appears to be roughly constant. There is 
perhaps some suggestion of a pattern in the points, which may indicate 
non-linearity, but it is quite weak. It may be safe to assume that the 
assumptions of the linear regression model are satisfied.

5)

Give a 99% confidence interval for the slope parameter β1.

alpha <- 0.01
ta2 <- qt(1 - alpha/2,n-2)
Sxx <- sum((x - xbar)**2)
se <- sigma_hat/sqrt(Sxx)
lo <- b1 - ta2 * se
up <- b1 + ta2 * se
99% confidence interval for slope: (0.560296,2.322762)

6)

State whether you would reject H0: β1=0 versus H1: β10 at the α=0.05 significance level.

We would reject H0: Note that a 95% confidence 
interval for b1 would be narrower than the 99% CI, while being 
centered at the same value. Since the 99% CI did not 
contain 0, neither then will the 95% CI, implying that we would 
reject H0 at the 0.05 significance level.

7)

Give an estimate of the mean height of persons with index finger length equal to 72 mm. In addition, give a 95% confidence interval for this mean height.

alpha <- 0.05
ta2 <- qt(1 - alpha/2,n-2)
xnew <- 72
se <- sigma_hat * sqrt(1/n + (xnew - xbar)**2/Sxx)
Ynewhat <- b0 + b1 * xnew
lo <- Ynewhat - ta2*se
up <- Ynewhat + ta2*se
Estimated mean height of persons with index finger length 
equal to 72 mm: 171.9293 
95% confidence interval: (167.9944,175.8642)

8)

A hand print is found made by a hand with an index finger length of 72 mm. Give an interval which will contain with 95% probability the height of the person who made the print.

alpha <- 0.05
ta2 <- qt(1 - alpha/2,n-2)
xnew <- 72
se <- sigma_hat * sqrt(1 + 1/n + (xnew - xbar)**2/Sxx)
Ynewhat <- b0 + b1 * xnew
lo <- Ynewhat - ta2*se
up <- Ynewhat + ta2*se
95% prediction interval: (153.1362,190.7224)

9)

Do you think the above interval would be useful in identifying the person who made the print?

This interval contains most (87.5%) of the heights of the 
students in the class; if a similar percentage of 
persons in the general population have heights in this interval, 
then the interval does not narrow down very much the number 
of persons who could possibly have made the hand print

10)

Give the value of the coefficient of determination for these data. Interpret the value.

SStot <- sum((Y - Ybar)**2)
SSreg <- sum((Yhat - Ybar)**2)
Rsq <- SSreg/SStot 
Coefficient of determination (R^2): 0.4914568

11)

Give the value of the test statistic Ttest=β^1σ^/Sxx as well as the value of Ftest=MSRegMSError.

# T test statistic
Ttest <- b1 / (sigma_hat/sqrt(Sxx))

# F test statistic
SSerror <- sum((Y - Yhat)**2)
MSerror <- SSerror/ (n-2)
MSreg <- SSreg / 1
Ftest <- MSreg / MSerror # equal to Ttest**2 in SLR
Value of T test statistic: 4.610947 
Value of F test statistic: 21.26083

12)

Give the p-value for testing H0: β0 versus H1: β1>0 based on the value of the test statistic Ttest. Interpret your answer.

pval <- 1 - pt(Ttest,n-2)
The p-value is the area to the right of 
the test statistic under the pdf of the t distribution 
with n-2 degrees of freedom. This is: 6.78375e-05. 
There is strong evidence of a positive linear relationship 
between length of index finger and height.

13)

Comment on whether there are any outliers in the data set. Show a plot to support your answer.

plot(lm(Y~x),which = 4)

There are a few observations with somewhat large Cook's 
distances, but they do not appear to be very extreme outliers.