STAT 530, Fall 2016 -------------------- Homework 2 ----------- **************************************************************************************** For this HW, all problems are mandatory for graduate students and undergraduate students. **************************************************************************************** NOTE: The air pollution data set (from chapter 2) is given on the course web page. You should use the FULL data set for the problems given below. You can read the data into R (as a data frame) with the code: airpol.full <- read.table("http://www.stat.sc.edu/~hitchcock/airpoll.txt", header=T) city.names <- as.character(airpol.full[,1]) airpol.data <- airpol.full[,2:8] Do the following 3 problems from the textbook (but read my notes below), and the other problems given below: 2.1**, 2.2***, 2.6**** NOTE: For EACH of these problems, also write several sentences explaining in words what substantive conclusions about the data that you can draw from the plots. ** Do Problem 2.1 from the textbook, but simply do a regular star plot for all 7 variables like we discussed in class, not the kind the book describes where they add the stars to a scatterplot. And also do a plot using Chernoff Faces. Write a short paragraph explaining what the plots tell you about the cities. You can include the "labels" argument to label the drawings for both the stars function and the faces function, e.g.: labels=city.names within the call of each function. *** For 2.2: Do problem 2.2 from the textbook, but just do the ordinary scatterplot matrix for this data set. Write a short paragraph explaining the main conclusions from the scatterplot matrix. **** For 2.6: For problem 2.6, you don't need to give chiplots for ALL pairs of variables. Just give them for a few pairs and write comments about those. This exercise is just to give you practice in making and interpreting chiplots. EXTRA PROBLEM 1: Do a bivariate boxplot of the pair of variables "Education" and "Mortality" from the air pollution data set. Explain what the plot tells you about the relationship between the two variables. Do you see any outliers? If so, which cities are they? EXTRA PROBLEM 2: Do a bubble plot with "Education" and "Mortality" on the axes and "Population Density" represented by the bubbles. Explain what the plot tells you about the relationships among the three variables. Comment on any notable cities.