STAT 530, Fall 2018
--------------------

Homework 4
-----------

NOTE:  For undergraduate students, Problems 1, 2, and 3 are mandatory.  Problems 4 and 5 are optional for undergraduate students.

For graduate students, it is mandatory to do Problems 1, 2, and 3, and EITHER of Problems 4 and 5 (your choice).  Graduate students may optionally choose to do all 5 problems if they wish.

If you do one or more optional problems, it will be graded for a small amount of extra credit.

IMPORTANT NOTE: For EACH of these problems, also write several sentences 
explaining in words what substantive conclusions about the data
that you can draw from the plots and/or analyses.

ALWAYS MAKE AN ATTEMPT TO INTERPRET THE FACTORS!  Sometimes this works better than other times...

NOTE:  The "school subjects" correlation matrix, the "pain" correlation matrix
and the Foodstuff Contents data set are given on the course web page.

### Problem 1:

Do a factor analysis on the "school subjects" correlation matrix, with a varimax rotation.
Compare your rotated loadings to the loadings given in the book in table 4.12. 

# R code to produce the "school subjects" correlation matrix in Problem 4.6, pages 89-90.
# The names of the six variables are (French, English, History, Arithmetic, Algebra, Geometry).  
# There were 220 individuals
# in this data set.

school.sub.corr <- matrix( c(
1,.44,.41,.29,.33,.25,
.44,1,.35,.35,.32,.33,
.41,.35,1,.16,.19,.18,
.29,.35,.16,1,.59,.47,
.33,.32,.19,.59,1,.46,
.25,.33,.18,.47,.46,1
), nrow=6, ncol=6, byrow=T)


### Problem 2:

Do 4.7(b,c) in the Everitt textbook.   For 4.7(c), just do an orthogonal rotation using "varimax", not an oblique rotation.

# R code to produce the "pain" correlation matrix in Problem 4.7, page 90
# The names of the nine variables are given on page 90.  There were 123 individuals in 
# this data set.

pain.corr <- matrix( c(
1,-.04,.61,.45,.03,-.29,-.3,.45,.3,
-.04,1,-.07,-.12,.49,.43,.3,-.31,-.17,
.61,-.07,1,.59,.03,-.13,-.24,.59,.32,
.45,-.12,.59,1,-.08,-.21,-.19,.63,.37,
.03,.49,.03,-.08,1,.47,.41,-.14,-.24,
-.29,.43,-.13,-.21,.47,1,.63,-.13,-.15,
-.3,.3,-.24,-.19,.41,.63,1,-.26,-.29,
.45,-.31,.59,.63,-.14,-.13,-.26,1,.4,
.3,-.17,.32,.37,-.24,-.15,-.29,.4,1
), nrow=9, ncol=9, byrow=T)

### Problem 3:

Do a factor analysis on the Foodstuff Contents data set.  Use a rotation, if appropriate.
Discuss your choice of the number of factors.  Calculate factor scores for the 
individual items, plot the factor scores using appropriate plot(s), and discuss your findings.

*The "Contents of Foodstuffs" data set (in Table 3.6) is given on the course web page.
Full descriptions of the observation names are on p. 63 of the book.

This R code will read in the data:

food.full <- read.table("http://www.stat.sc.edu/~hitchcock/foodstuffs.txt", header=T)
food.labels <- as.character(food.full[,1])
food.data <- food.full[,-1]

NOTE:  for Problem 3, if you use the 'factanal' function to perform the factor analysis on the Foodstuffs data set, 
it will not allow you to choose 3 or more factors for a data set with only 5 variables.  In this case (for the purposes 
of this HW) it is OK to choose the highest number of factors that the 'factanal' function will allow, 
even if the chi-square test indicates this number of factors is not quite sufficient.

### Problem 4:

Table 5.12 summarizes data collected where subjects were asked to compare eight legal offenses and and to say how 
dissimilar each on was to each other one.  The dissimilarity matrix below shows, for each pair of offenses, the 
percentages of subjects who judged that the two offenses were very dissimilar.  The following R code will read in the
dissimilarity matrix and create a vector of labels for the eight offenses.

offenses.diss <- matrix( c(
0,21.1,71.2,36.4,52.1,89.9,53.0,90.1,
21.1,0,54.1,36.4,54.1,75.2,73.0,93.2,
71.2,54.1,0,36.4,52.1,36.4,75.2,71.2,
36.4,36.4,36.4,0,0.7,54.1,52.1,63.4,
52.1,54.1,52.1,0.7,0,53.0,36.4,52.1,
89.9,75.2,36.4,54.1,53.0,0,88.3,36.4,
53.0,73.0,75.2,52.1,36.4,88.3,0,73.0,
90.1,93.2,71.2,63.4,52.1,36.4,73.0,0
), nrow=8, ncol=8, byrow=T)
the.offenses<-c('assault.battery','rape','embezzlement','perjury','libel','burglary','prostitution','receive.stolen.goods')

Find a two-dimensional multidimensional scaling solution, plot the offenses on a 2-D map, and try to interpret 
the dimensions underlying the subjects' judgments.

#### Problem 5:

### THIS 5th PROBLEM IS MANDATORY FOR GRADUATE STUDENTS BUT OPTIONAL (EXTRA CREDIT) FOR UNDERGRADS.

For the life expectancy data from the class examples, apply a ONE-FACTOR factor analysis model separately for the 
life expectancies for men and women.  Attempt to interpret the factor for each data set, if possible, and 
make plots of the factor scores (try to plot the single factor scores for men and those for women on the same scatterplot).
The following R code should create the two data sets (for men and for women) needed to do the problem.

life.df.full <- read.table("http://www.stat.sc.edu/~hitchcock/lifeex.txt", header=T)
country.names <- life.df.full[,1]
life.df.men <- life.df.full[,2:5]
row.names(life.df.men) <- country.names
life.df.women <- life.df.full[,6:9]
row.names(life.df.women) <- country.names