STAT 530, Fall 2022
--------------------
Homework 4
-----------
NOTE: For all students, all problems (1, 2, and 3) are mandatory.
IMPORTANT NOTE: For EACH of these problems, give the output with the factor loadings and
(if possible for the problem) a plot of the factor scores. Also write several sentences
explaining in words what substantive conclusions about the data
that you can draw from the plots and/or analyses.
ALWAYS MAKE AN ATTEMPT TO INTERPRET THE FACTORS! Sometimes this works better than other times...
NOTE: The "school subjects" correlation matrix, the "pain" correlation matrix
and the Foodstuff Contents data set are given on the course web page and shown below.
### Problem 1:
Do a factor analysis on the "school subjects" correlation matrix, with a varimax rotation.
Briefly compare your rotated loadings to the loadings given by an unrotated solution.
Discuss your choice of the number of factors.
# R code to produce the "school subjects" correlation matrix in page 160 of the Everitt and Hothorn textbook.
# The six variables are test scores on these subjects: (French, English, History, Arithmetic, Algebra, Geometry).
# There were 220 individuals in this data set.
school.sub.corr <- matrix( c(
1,.44,.41,.29,.33,.25,
.44,1,.35,.35,.32,.33,
.41,.35,1,.16,.19,.18,
.29,.35,.16,1,.59,.47,
.33,.32,.19,.59,1,.46,
.25,.33,.18,.47,.46,1
), nrow=6, ncol=6, byrow=T)
### Problem 2:
Do a factor analysis on the "pain" correlation matrix, with a varimax rotation.
Discuss your choice of the number of factors.
# R code to produce the "pain" correlation matrix.
# There were 123 individuals in this data set.
# The 9 variables were individuals' answers to nine survey questions.
# Each statement was scored on a scale from 1 to 6, ranging from agreement
# to disagreement. The nine pain statements were as follows:
# 1. Whether or not I am in pain in the future depends on the skills of the doctors.
# 2. Whenever I am in pain, it is usually because of something I have done or not done.
# 3. Whether or not I am in pain depends on what the doctors do for me.
# 4. I cannot get any help for my pain unless I go to seek medical advice.
# 5. When I am in pain I know that it is because I have not been taking proper exercise or eating the right food.
# 6. People’s pain results from their own carelessness.
# 7. I am directly responsible for my pain.
# 8. Relief from pain is chiefly controlled by the doctors.
# 9. People who are never in pain are just plain lucky
pain.corr <- matrix( c(
1,-.04,.61,.45,.03,-.29,-.3,.45,.3,
-.04,1,-.07,-.12,.49,.43,.3,-.31,-.17,
.61,-.07,1,.59,.03,-.13,-.24,.59,.32,
.45,-.12,.59,1,-.08,-.21,-.19,.63,.37,
.03,.49,.03,-.08,1,.47,.41,-.14,-.24,
-.29,.43,-.13,-.21,.47,1,.63,-.13,-.15,
-.3,.3,-.24,-.19,.41,.63,1,-.26,-.29,
.45,-.31,.59,.63,-.14,-.13,-.26,1,.4,
.3,-.17,.32,.37,-.24,-.15,-.29,.4,1
), nrow=9, ncol=9, byrow=T)
### Problem 3:
Do a factor analysis on the Foodstuff Contents data set (same data as with HW 3). Use a varimax rotation.
Discuss your choice of the number of factors. Calculate factor scores for the
individual items, plot the factor scores using appropriate plot(s), and discuss your findings. See NOTE below!
*The "Contents of Foodstuffs" data set is given on the course web page.
Full descriptions of the observation names, in order, are given in the vector below.
This R code will read in the data:
food.full <- read.table("http://people.stat.sc.edu/hitchcock/foodstuffs.txt", header=T)
food.labels <- as.character(food.full[,1])
food.data <- food.full[,-1]
food.descriptions <- c('beef_braised','hamburger','beef roast','beef_steak','beef_canned','chicken_broiled',
'chicken_canned','beef_heart','lamb_leg_roast','lamb_shoulder_roast','ham_smoked','pork_roast','pork_simmered',
'beef_tongue','veal_cutlet','bluefish_baked','clams_raw','clams_canned','crabmeat_canned','haddock_fried',
'mackerel_broiled','mackerel_canned','perch_fried','salmon_canned','sardines_canned','tuna_canned','shrimp_canned')
# The Food Descriptions are provided mostly just so you'll know what the abbreviated labels stand for.
#To see the labels and descriptions together:
cbind(food.labels,food.descriptions)
NOTE: for Problem 3, if you use the 'factanal' function to perform the factor analysis on the Foodstuffs data set,
it will not allow you to choose 3 or more factors for a data set with only 5 variables. In this case (for the purposes of this HW) it is OK to choose the highest number of factors that the 'factanal' function will allow,
even if the chi-square test indicates this number of factors is not quite sufficient.