STAT 530, Fall 2016 -------------------- Homework 6 ----------- ALL students should do all of the following problems: IMPORTANT NOTE: For EACH of these problems, write a couple of sentences explaining in words what substantive conclusions about the data that you can draw from the plots and/or analyses. PROBLEM 1: --------------- Use linear discriminant analysis (LDA) to build a classification rule to classifying the Bumpus bird data into two groups ("survived" and "died") based on the 5 numerical measurements. Assume equal prior probabilities of surviving and dying. The Bumpus bird data (along with a survival/death indicator vector) can be read in using the following R code: bumpbird <- read.table("http://www.stat.sc.edu/~hitchcock/bumpusbird.txt", header=T) names(bumpbird) <- c("ID", "tot.length", "alar.length", "beak.head.length", "humerus.length", "keel.stern.length") attach(bumpbird) bumpbird.numeric <- bumpbird[,-1] bumpbird.IDs <- bumpbird[,1] survival.indicator <- as.factor(c(rep("survived",times=21),rep("died",length=28))) (a) Use the LDA rule to predict the survival status for a hypothetical bird with: tot.length=156, alar.length=242, beak.head.length=31.4, humerus.length=18.1, keel.stern.length=19.4 Give the probability of surviving for such a bird. (b) Find the plug-in misclassification rate and the cross-validation misclassification rate for the LDA classification rule. PROBLEM 2: --------------- Do Problem 7.4 in the Everitt textbook, but use the CLASSIFICATION TREE approach of the Skulls data to obtain the classification tree (show the plot of the tree) and classify into an Epoch the new skull with the measurements given in problem 7.4. You may assume equal prior probabilities of being in each category. PROBLEM 3: --------------- Use Hotelling's T^2 test and the data in the test score data set (scores on math and reading tests given to a sample of girls and a sample of boys) to test for a difference in the mean score vector of the boys and the mean vector of the girls. The following R code will read in the data: testdata <- read.table("http://www.stat.sc.edu/~hitchcock/testscoredata.txt", header=T) attach(testdata) testdata.noIDs <- testdata[,-1] #to remove the ID numbers PROBLEM 4: --------------- Consider the 'hsb' data set that we have studied in class. Suppose our goal is to compare the mean vectors (where the variables are the scores on: read, write, math, science, socst) among the different levels of 'ses' (high, middle, and low socioeconomic classes). hsb <- read.table("http://www.stat.sc.edu/~hitchcock/hsbdata.txt", header=T) attach(hsb) hsb.prob4 <- hsb[1,c(5,8,9,10,11,12)] ############################################### (a) Conduct the MANOVA F-test using Wilks' Lambda to test for a difference in (read, write, math, science, socst) mean vectors across the three ses classes. Use a 0.05 significance level, and give the P-value of the test. (b) Check to see whether the assumptions of your test are met. Do you believe your inference is valid? (c) Examine the sample mean vectors for each group. Informally comment on the differences among the groups in terms of the specific variables.