STAT 530, Fall 2016
--------------------

Homework 5
-----------


Do the following problems.  The first two are required for everyone and the third
is required for graduate students and optional (extra credit) for undergrads.

5.4, Problem 2[see below]

Problem 2:  Do both a hierarchical clustering and a partitioning clustering 
of the tennis racquet data on the course web page.  
For each clustering, you may pick your favorite specific approach.  
Give the partitions of racquets into clusters, give some plot(s) 
to visualize the cluster structure, and make an attempt to characterize the clusters.

The racquet data can be read in with the following code:
racq.data <- read.table("http://www.stat.sc.edu/~hitchcock/racquetsdata530.txt",header=T)
racquet.names <- as.character(racq.data[,1])
racquet.numeric.data <- racq.data[,-1]

The variables is the tennis racquets data set are:
X1 = length of racquet (in inches)
X2 = static weight (in ounces) = this is how much the racquet actually weighs on a scale
X3 = balance (in inches)  = this is a measure of whether the racquet is heavier in on the head end or on the handle end; 
     more negative values indicate a more head-heavy racquet; positive values indicate a more head-light racquet; 
     zero indicates an even balance.
X4 = swingweight = this is a complicated measure of how heavy the racquet FEELS when it is swung
X5 = headsize (in square inches) = the size of the racquet face (the strung area)
X6 = beamwidth (in mm) = the width of the cross-section (edge) of the racquet


GRADUATE STUDENT PROBLEM:  6.3*** (This is optional (extra credit) for undergraduate students.)
NOTE:  Display both the clustering result for the number of clusters that BIC suggests, and then 
give the result for the best 3-cluster solution.  Which do you prefer?

IMPORTANT NOTE: For EACH of these problems, also write several sentences 
explaining in words what substantive conclusions about the data
that you can draw from the plots and/or analyses.

NOTE:  The "legal offense" dissimilarity matrix, the "hair/eye" data, the
racquet data, and the pottery data in Table 6.3 are given on the course web page.

*** Read Section 6.3 for some insight into the variables in the pottery data set.
Also note that "No" (number) and "Kiln" are simply labeling variables and 
should NOT be included in the cluster analysis algorithm.