----------------------------
Ex 2.1
----------------------------
R code:

> library(oehlert) # all data in textbook now easily available
> library(perm) # has permTS() function
> attach(ex02.1)
> ex02.1 # always look at data to see what variables are called
> permTS(fracHigh[text=="new"],fracHigh[text=="old"],alt="great")

Gives output:

        Exact Permutation Test (network algorithm)

data:  GROUP 1 and GROUP 2
p-value = 0.6857

P-value =0.69 > 0.05 so we accept H0: mu1=mu2 at the 5% level; we accept that the new book has the same mean score as the old book.

----------------------------
Ex 2.2
----------------------------
R code:

> attach(ex02.2)
> library(cfcdae) # if this doesn't work, use the following:
> source("http://people.stat.sc.edu/hansont/stat506/permsign.test.R")
> permsign.test(nImpreg,plot=TRUE) # permutation test

Gives output:

$upper.p
[1] 0.0515

P-value=0.052 > 0.05 so we accept H0: mu1=mu2 at the 5% level; the drug does not significantly improve fertility at the 5% level.  That's all I expected you to do, however note that the usual t-test *does* reject at the 5% level:

> t.test(nImpreg,alt="great")

        One Sample t-test

data:  nImpreg
t = 2.0254, df = 6, p-value = 0.04461

A look at a boxplot of differences shows no departures from normality:

> boxplot(nImpreg)

A formal test of H0: data are normal gives

> shapiro.test(nImpreg)

        Shapiro-Wilk normality test

data:  nImpreg
W = 0.88114, p-value = 0.2316

We accept that the data are normal and so the t-test is valid.  Good news because we reject @ 5% level w/ the t-test!

----------------------------
Ex 2.3
----------------------------
If the kids choose their own treatments you do not have a randomized experiment.

The second and third methods are also not randomized. They might give good results if you were lucky in the order in which units arrived at your office, but could give very poor results if, for example, if parents enroll two children and always enroll the younger and then the older. 

The fourth and fifth methods are both randomized and will tend to protect against confounding.  Method five is balanced.

----------------------------
Ex 2.4
----------------------------
R code:

> attach(ex02.4)
> permTS(nRemoved[pH=="reduced"],nRemoved[pH=="neutral"])

Gives output:

        Exact Permutation Test (network algorithm)

data:  GROUP 1 and GROUP 2
p-value = 0.4

We do accept that H0: mu1=mu2 at the 5% level.  There is no significant difference in mean number of Copepoda removed from each microcosm during the first 14 days of snowmelt.

----------------------------
Ex 3.1
----------------------------
R code and output:

> attach(ex03.1)
> mean(liverWt)
[1] 3.718276
> tapply(liverWt,diet,mean)
       1        2        3        4 
3.745714 3.580000 3.598333 3.922500 # mu1hat, mu2hat, m3hat, mu4hat

There are different ways to compute the treatment effects.  Here's one:

> tapply(liverWt,diet,mean)-mean(liverWt)
          1           2           3           4 
 0.02743842 -0.13827586 -0.11994253  0.20422414 

Overall mean is 3.72%.  Treatment effects are alpha1hat=0.03, alpha2hat=-0.14, alpha3hat=-0.12, and alpha4hat=0.20.  This is for constraint sum_i n_i alpha_i=0.

> f=lm(liverWt~factor(diet))
> anova(f)
Analysis of Variance Table

Response: liverWt
             Df  Sum Sq  Mean Sq F value  Pr(>F)  
factor(diet)  3 0.57821 0.192736  4.6581 0.01016 *
Residuals    25 1.03440 0.041376                  
---

P-value=0.01 < 0.05; we reject that the treatment effects are zero at the 5% level; i.e. there is a significant treatment effect.

----------------------------
Ex 3.3
----------------------------
R code and output:

> attach(ex03.3)
> f=lm(moisture~treat) # treat is already a factor because levels are words
> anova(f)
Analysis of Variance Table

Response: moisture
          Df Sum Sq Mean Sq F value Pr(>F)
treat      3  36.18   12.06  0.9926  0.444
Residuals  8  97.20   12.15

We accept that there is no differences in treatment means, i.e. accept H0: mu1=mu2=mu3=mu4 at the 5% level because p-value=0.44 > 0.05.  This is all I asked you to do.  However, looking at the data:

> boxplot(moisture~treat)

indicates constant variance is not satisfied.  If we instead use a correct test

> oneway.test(moisture~treat)

        One-way analysis of means (not assuming equal variances)

data:  moisture and treat
F = 2.4518, num df = 3.0000, denom df = 4.2124, p-value = 0.1972

The p-value gets cut in half: using the right test increases power (makes the p-value smaller).  Still accept though.

This is one case where maybe a test of constant *variance* might be more appropriate.  Bartlett's test is one such test.  The null is H0: sigma1=sigma2=sigma3=sigma4 and the test assumes normal data across groups (and allows for a different mean in each group too):

> bartlett.test(moisture~treat)

        Bartlett test of homogeneity of variances

data:  moisture by treat
Bartlett's K-squared = 10.226, df = 3, p-value = 0.01674

P-value=0.017<0.05 so we reject constant variance at the 5% level!  There are differences in moisture across treatment groups but the variances differ, not the means.  It's good to have several tools in your "statistics toolbox".  Note that Bartlett's test is highly sensitive to the normality assumption; a test which is not (but might have less power) is Levene's test.

----------------------------
Pr 3.2
----------------------------
This appears to be a balanced (equal sample size across groups), completely randomized design.  Note in the problem description "Other than
the number and type of companions, the males were treated identically."  Thus all other factors -- besides partners -- thought to affect lifespan were kept the same. 

When trying to attach the data from the 'oehlert' package:

> attach(pr03.2)

R says something about 'treat' being "masked".  That means that the data set just attached has the same column name as a previous data set.  The 'treat' variable from Ex. 3.3 is now replaced w/ the 'treat' variable from Pr. 3.2. 

I always look at the data first:

> boxplot(lifespan~treat)

Clearly, the males receiving eight virgin females typically live less than the other four treatment groups, thus visually confirming hypothesis of "reproductive cost". 

> f=lm(lifespan~treat)
> anova(f)
Analysis of Variance Table

Response: lifespan
           Df Sum Sq Mean Sq F value    Pr(>F)    
treat       4  11939 2984.82  13.612 3.516e-09 ***
Residuals 120  26314  219.28

The p-value=0.0000000035<0.05 is highly significant at the 5% level.  There is definitely a difference in longevity among the groups.  Likely, it is solely due to the difference between "eight virgin females" and the other four groups.  In fact, we can remove this group and fit a oneway ANOVA on the remaining four groups:

> f2=lm(lifespan[treat!="8virgin"]~treat[treat!="8virgin"])
> anova(f2) # '!=' is 'not equal to'
Analysis of Variance Table

Response: lifespan[treat != "8virgin"]
                          Df  Sum Sq Mean Sq F value Pr(>F)
treat[treat != "8virgin"]  3   988.1  329.36  1.3869 0.2515
Residuals                 96 22798.5  237.48

We accept no difference among the remaining four groups!  Only those receiving 8 virgins had significantly lower mean lifetimes (in days).

----------------------------
Pr 3.4
----------------------------
Look at the data:

> attach(pr03.5)
> pr03.5
> plot(strength~pctFiber) # scatterplot
> boxplot(strength~pctFiber) # side-by-side boxplots

Clearly strength goes down as a function of fiber%, but the relationship is not linear.  It might be approximately quadratic though.

> m1=lm(strength~pctFiber) # linear
> m2=lm(strength~pctFiber+I(pctFiber^2)) # quadratic
> m3=lm(strength~factor(pctFiber)) # separate means
> anova(m1,m2,m3)
Analysis of Variance Table

Model 1: strength ~ pctFiber
Model 2: strength ~ pctFiber + I(pctFiber^2)
Model 3: strength ~ factor(pctFiber)
  Res.Df    RSS Df Sum of Sq      F  Pr(>F)  
1     13 2.0080                              
2     12 1.5042  1   0.50381 4.1752 0.06827 .
3     10 1.2067  2   0.29752 1.2328 0.33222  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> AIC(m1,m2,m3)
   df      AIC
m1  3 18.40449
m2  4 16.07123
m3  6 16.76533

The AIC picks the quadratic model as adequate for these data.  Significance testing picks linear, but the p-value is so closed to 0.05 that I'd feel more comfortable leaving it in.  Either way is fine for your HW solutions.  Testing all three models again a constant mean yields significant p-values, i.e. the constant mean model is always rejected:

> anova(m1)
Analysis of Variance Table

Response: strength
          Df Sum Sq Mean Sq F value    Pr(>F)    
pctFiber   1 5.4613  5.4613  35.357 4.854e-05 ***
Residuals 13 2.0080  0.1545                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> anova(m2)
Analysis of Variance Table

Response: strength
              Df Sum Sq Mean Sq F value    Pr(>F)    
pctFiber       1 5.4613  5.4613 43.5690 2.535e-05 ***
I(pctFiber^2)  1 0.5038  0.5038  4.0192   0.06808 .  
Residuals     12 1.5042  0.1253                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> anova(m3)
Analysis of Variance Table

Response: strength
                 Df Sum Sq Mean Sq F value    Pr(>F)    
factor(pctFiber)  4 6.2627 1.56567  12.975 0.0005713 ***

Strength is significantly associated with fiber% and this relationship can be described as approximately quadratic.  Important question: at which fiber% is strength *maximized*?  Answer requires calculus.