STAT 704 Fall 2016 -------------------- Homework 2 ------------ Please write your answers neatly and clearly! KNNL = (the Kutner, Nachtsheim, Neter, Li textbook) 1. A random sample of 796 teenagers revealed that in this sample, the mean number of hours per week of TV watching was 13.2, with a standard deviation of 1.6. Find (AND INTERPRET) a 90% confidence interval for the true mean weekly TV-watching time for teenagers. Why can we use a t CI procedure in this problem? 2. An engineer wants to calibrate a pH meter. She uses the meter to measure the pH in 14 neutral substances (pH = 7.0), obtaining the following data: 6.986, 7.009, 7.028, 7.037, 7.028, 7.009, 7.053, 7.028, 7.011, 7.021, 7.037, 7.070, 7.058, 7.013 (a) Use a graph to determine whether the assumption of normality for these data is reasonable. (b) Test (at a 0.05 significance level) whether the true mean pH reading for neutral substances differs from 7.0. Use R or SAS and report the P-value of your test. 3. Suppose a sample of 10 types of compact cars reveals the following one-day rental prices (in dollars) for Hertz and Thrifty, respectively: Hertz: 37.16, 14.36, 17.59, 19.73, 30.77, 26.29, 30.03, 29.02, 22.63, 39.21 Thrifty: 29.49, 12.19, 15.07, 15.17, 24.52, 22.32, 25.30, 22.74, 19.35, 34.44 (a) Explain why this is a paired-sample problem. (b) Use a graph to determine whether the assumption of normality is reasonable. (c) Test (at a 0.05 significance level) with a t-test whether Thrifty has a lower true mean rental rate than Hertz. Use R or SAS and report the P-value of your test. 4. Examine the data in Problem 16.7 on page 723 of the KNNL textbook. We will only deal with the data on the first two lines ("Low" and "Moderate"). (a) Use R or SAS to prepare side-by-side box plots for the two samples. Do the spreads seems to differ across samples? (b) Test (at a 0.05 significance level) with a t-test whether the firms rated "Moderate" have a significantly higher mean productivity improvement than those rated "Low". Use R or SAS and report the P-value of your test. (c) Using R or SAS, find (and interpret) a 90% CI for the difference in mean productivity improvement between firms rated "Moderate" and those rated "Low". 5. A cereal company claims its boxes contain 445 grams of cereal. A random sample of 15 boxes produces the following measurements: 446.92 447.48 436.14 443.68 441.82 445.80 435.49 445.87 445.81 440.28 433.12 436.76 432.55 446.32 444.62 (a) Use a graph to determine whether the assumption of normality is reasonable. (b) Using an appropriate nonparametric test (at a 0.05 significance level), determine whether the center of the distribution of cereal weights is 445 grams. Use R or SAS and report the P-value of your test. 6. For the special case of n1 = n2 = n, show that the test statistic for the two-independent samples t-test (assuming equal population variances) has a t-distribution under the null hypothesis. What are the degrees of freedom for this t-distribution? (You can use the fact that the sum of two independent chi-square r.v.'s is a chi-square with degrees of freedom equaling the sum of the d.f. for the individual r.v.'s.) 7. The following R code computes the power (the probability of rejecting H0) of the t-test (first number output) and the sign test (second number output) for testing H0: mu = 0 against the two-sided alternative, using a nominal significance level of 0.05. The first section of code is for data that come from a normal distribution, while the second section of code is for data that come from a Cauchy (very heavy-tailed) distribution. Note that the power depends on the true value of mu in the population, which is given by the argument "true.mean" in the code below. It also depends on the sample size, which is given by "samp.size". So tests.power.norm(true.mean=0.5,samp.size=20) will calculate the power of the t-test and sign test for H0: mu = 0 on 20 data values that actually come from a Normal distribution with mean 0.5. And tests.power.cauchy(true.mean=0.5,samp.size=20) will calculate the power of the t-test and sign test for H0: mu = 0 on 20 data values that actually come from a Cauchy distribution that is shifted to have center 0.5 (technically the median). Note that setting true.mean=0 (where H0 is true) will produce the actual significance level of the test(s), the probability of rejecting H0 when it is actually true. Run the codes for various values of true.mean and samp.size to investigate the powers and actual significance levels of the two tests under various data conditions. Write a paragraph summarizing your findings. [Note: You must copy/paste our quantile.test function into R first.] ####################################################################### # Power when the data are truly Normal: ####################################################################### tests.power.norm = function(samp.size=10,nsim=1000,true.mean=0,sds=1){ lower = qt(.025,df=samp.size - 1) upper = qt(.975,df=samp.size - 1) ts = replicate(nsim, t.test(rnorm(samp.size,mean=true.mean,sd=sds))$statistic) myps = replicate(nsim, quantile.test(rnorm(samp.size,mean=true.mean,sd=sds))$p.value) cbind(sum(ts < lower | ts > upper) / nsim, sum(myps < .05)/nsim ) } ####################################################################### # Power when the data are truly Cauchy data (HEAVY tails), i.e., t with df=1: ####################################################################### tests.power.cauchy = function(samp.size=10,nsim=1000,true.mean=0,sds=1){ lower = qt(.025,df=samp.size - 1) upper = qt(.975,df=samp.size - 1) ts = replicate(nsim, t.test(rt(samp.size,ncp=true.mean,df=1))$statistic) myps = replicate(nsim, quantile.test(rt(samp.size,ncp=true.mean,df=1))$p.value) cbind(sum(ts < lower | ts > upper) / nsim, sum(myps < .05)/nsim ) }