STAT 704 Fall 2016
--------------------

Homework 2
------------

Please write your answers neatly and clearly!

KNNL = (the Kutner, Nachtsheim, Neter, Li textbook)


1.  A random sample of 796 teenagers revealed that in this sample,
the mean number of hours per week of TV watching was 13.2, with a 
standard deviation of 1.6.  Find (AND INTERPRET) a 90% confidence 
interval for the true mean weekly TV-watching time for teenagers.
Why can we use a t CI procedure in this problem?

2.  An engineer wants to calibrate a pH meter.  She uses the meter 
to measure the pH in 14 neutral substances (pH = 7.0), obtaining the 
following data:
6.986, 7.009, 7.028, 7.037, 7.028, 7.009, 7.053, 7.028, 7.011, 7.021, 7.037, 7.070, 7.058, 7.013

(a) Use a graph to determine whether the assumption of normality for 
these data is reasonable.

(b) Test (at a 0.05 significance level) whether the true mean pH reading
for neutral substances differs from 7.0.  Use R or SAS and report the 
P-value of your test.

3.  Suppose a sample of 10 types of compact cars reveals the following 
one-day rental prices (in dollars) for Hertz and Thrifty, respectively:

Hertz:  37.16, 14.36, 17.59, 19.73, 30.77, 26.29, 30.03, 29.02, 22.63, 39.21
Thrifty:  29.49, 12.19, 15.07, 15.17, 24.52, 22.32, 25.30, 22.74, 19.35, 34.44

(a) Explain why this is a paired-sample problem.

(b) Use a graph to determine whether the assumption of normality is reasonable.

(c) Test (at a 0.05 significance level) with a t-test whether Thrifty has 
a lower true mean rental rate than Hertz.  Use R or SAS and report the 
P-value of your test.

4.  Examine the data in Problem 16.7 on page 723 of the KNNL textbook.  We will 
only deal with the data on the first two lines ("Low" and "Moderate").

(a) Use R or SAS to prepare side-by-side box plots for the two samples.
Do the spreads seems to differ across samples?

(b) Test (at a 0.05 significance level) with a t-test whether the firms rated 
"Moderate" have a significantly higher mean productivity improvement than those 
rated "Low".  Use R or SAS and report the P-value of your test.

(c) Using R or SAS, find (and interpret) a 90% CI for the difference in mean 
productivity improvement between firms rated "Moderate" and those rated "Low".

5. A cereal company claims its boxes contain 445 grams of cereal.  A random sample
of 15 boxes produces the following measurements:
446.92 447.48 436.14 443.68 441.82 445.80 435.49 445.87 445.81 440.28
433.12 436.76 432.55 446.32 444.62

(a)  Use a graph to determine whether the assumption of normality is reasonable.

(b)  Using an appropriate nonparametric test (at a 0.05 significance level), 
determine whether the center of the distribution of cereal weights is 445 grams.
Use R or SAS and report the P-value of your test.

6.  For the special case of n1 = n2 = n, show that the test statistic for the 
two-independent samples t-test (assuming equal population variances) has a 
t-distribution under the null hypothesis.  What are the degrees of freedom for
this t-distribution?
(You can use the fact that the sum of two independent chi-square r.v.'s is a 
chi-square with degrees of freedom equaling the sum of the d.f. for the 
individual r.v.'s.)


7. The following R code computes the power (the probability of rejecting H0) 
of the t-test (first number output) and the sign test (second number output) for testing
H0: mu = 0 against the two-sided alternative, using a nominal significance level of 0.05.

The first section of code is for data that come from a normal distribution, while the 
second section of code is for data that come from a Cauchy (very heavy-tailed) distribution.

Note that the power depends on the true value of mu in the population, which is given by 
the argument "true.mean" in the code below.  It also depends on the sample size, which is given 
by "samp.size".  So 
tests.power.norm(true.mean=0.5,samp.size=20)
will calculate the power of the t-test and sign test for H0: mu = 0 on 20 data values that actually 
come from a Normal distribution with mean 0.5.
And 
tests.power.cauchy(true.mean=0.5,samp.size=20)
will calculate the power of the t-test and sign test for H0: mu = 0 on 20 data values that actually 
come from a Cauchy distribution that is shifted to have center 0.5 (technically the median).

Note that setting true.mean=0 (where H0 is true) will produce the actual significance level of the
test(s), the probability of rejecting H0 when it is actually true.

Run the codes for various values of true.mean and samp.size to investigate the powers and 
actual significance levels of the two tests under various data conditions.  Write a paragraph summarizing
your findings. [Note:  You must copy/paste our quantile.test function into R first.]

#######################################################################
# Power when the data are truly Normal:
#######################################################################

tests.power.norm = function(samp.size=10,nsim=1000,true.mean=0,sds=1){
    lower = qt(.025,df=samp.size - 1)
    upper = qt(.975,df=samp.size - 1)
    ts = replicate(nsim,
       t.test(rnorm(samp.size,mean=true.mean,sd=sds))$statistic)
    myps = replicate(nsim,
       quantile.test(rnorm(samp.size,mean=true.mean,sd=sds))$p.value)
    cbind(sum(ts < lower | ts > upper) / nsim, sum(myps < .05)/nsim )
}

#######################################################################
# Power when the data are truly Cauchy data (HEAVY tails), i.e., t with df=1:
#######################################################################

tests.power.cauchy = function(samp.size=10,nsim=1000,true.mean=0,sds=1){
    lower = qt(.025,df=samp.size - 1)
    upper = qt(.975,df=samp.size - 1)
    ts = replicate(nsim,
       t.test(rt(samp.size,ncp=true.mean,df=1))$statistic)
    myps = replicate(nsim,
       quantile.test(rt(samp.size,ncp=true.mean,df=1))$p.value)
    cbind(sum(ts < lower | ts > upper) / nsim, sum(myps < .05)/nsim )
}