STAT 530 (Applied Multivariate Statistics and Data Mining)
Fall 2022
Instructor
David Hitchcock, associate professor of statistics
Syllabus
Syllabus: (pdf document)
Office Hours -- Fall 2022
Monday, Tuesday, Wednesday, Friday, 10:45 am - 11:45 am, Thursday 10:00-11:00 am or by appointment
Office: 215C LeConte College
Phone: 777-5346
E-mail: hitchcock@stat.sc.edu
Class Meeting Time
Mon-Wed-Fri 9:40-10:30 a.m., LeConte College Room 103 or via distance by streaming video on Blackboard Collaborate Ultra
Current Textbooks
An Introduction to Multivariate Analysis with R (2011), by Brian Everitt and Tolsten Holthorn. (available as a free (possibly only via USC computers) download at the textbook site).
An Introduction to Statistical Learning with Applications in R (2013), by James, Witten, Hastie, and Tibshirani (available as free download at the ISL textbook site).
Courses that may serve as a prerequisite:
Any of the following: PSYC 228 or 709; EDRM 710; STAT 509, 515, 700, or 704; MGSC 291, 391 or 692; BIOS 700.
(If you have had a course that may be equivalent to one of these, please contact me about it.)
Course Description
530Applied Multivariate Statistics and Data Mining (3) (Prereq: A grade of C or higher in STAT 515, STAT 205, STAT 509, STAT 512, ECON 436, MGSC 391, PSYC 228, or equivalent )
Introduction to fundamentals of multivariate statistics and data mining. Principal components and factor analysis; multidimensional scaling and cluster analysis; MANOVA and discriminant analysis; decision trees; and support vector machines. Use of appropriate software.
Purpose:
To introduce students with a variety of statistical backgrounds to the basic ideas in multivariate statistics.
It will cover the assumptions, limitations, and uses of basic techniques such as cluster analysis, principal components analysis, and factor analysis as well as how to implement these methods in R.
Instead of theoretical development, the focus will be on the intuitive understanding and applications of these methods to real data sets by the students.
Available Computing Resources
R is available as a free download (from the CRAN home page).
These packages are also available on the computers in
the labs in LeConte College (and a few other buildings).
Help in using R can be found on the
CRAN home page.
Course Notes
Downloading Instructions for R
Computing Tips: Some Review
Computer Code for Class Examples
Example R Code
- Chapter 2 example R code (Enhanced scatterplots, Convex hull, Chi-plot, Bivariate boxplot, Bivariate density estimator, Bubble plot, Scatterplot matrix, 3-D scatterplot, Star plot, Chernoff faces, Pirate plots)
- Chapter 3 example R code (Principal Components Analysis, including scree plots, plots of PC scores, and CIs for variances of the population PCs)
- Chapter 4 example R code (this is Chapter 5 in the new Everitt/Hothorn textbook) (Factor Analysis, including rotations, plotting factor scores, and model diagnostics)
- Chapter 5 example R code (this is Chapter 4 in the new Everitt/Hothorn textbook) (Multidimensional Scaling and Correspondence Analysis)
- Chapter 7 example R code (Discriminant Analysis, Classification using Logistic Regression) (not part of new Everitt/Hothorn book)
Example SAS Code
Data Sets
Other data sets used in the course:
- Blood Glucose data (from Table 8.6 -- slightly corrected from book's printing which had some typos)
Homework
Some Homework Solution Code and General Comments
Project Information
Midterm Exam Information
- Bulls data (data set 1 for midterm exam; the R code above will read this into R)
Final Exam Information
- Mammals Data (data set 1 for final exam; the R code above will read this into R)