STAT 530 (Applied Multivariate Statistics and Data Mining)

Fall 2018

Instructor

David Hitchcock, associate professor of statistics

Syllabus

Syllabus: (Word document) or (pdf document)

Office Hours -- Fall 2018

Monday, Wednesday, Friday, 10:50 am - 11:50 am, Tuesday 1:00-2:00 pm or by appointment

Office: 209A LeConte College
Phone: 777-5346
E-mail: hitchcock@stat.sc.edu

Class Meeting Time

Mondays and Wednesdays 2:20-3:35 pm, Wardlaw 116 or via distance by streaming video

Current Textbooks

An R and S-PLUS Companion to Multivariate Analysis (2005), by Brian Everitt.
An Introduction to Statistical Learning with Applications in R (2013), by James, Witten, Hastie, and Tibshirani (available as free download at the ISL textbook site).

Courses that may serve as a prerequisite: Any of the following: PSYC 228 or 709; EDRM 710; STAT 509, 515, 700, or 704; MGSC 291, 391 or 692; BIOS 700.
(If you have had a course that may be equivalent to one of these, please contact me about it.)

Course Description

530—Applied Multivariate Statistics and Data Mining (3) (Prereq: A grade of C or higher in STAT 515, STAT 205, STAT 509, STAT 512, ECON 436, MGSC 391, PSYC 228, or equivalent ) Introduction to fundamentals of multivariate statistics and data mining. Principal components and factor analysis; multidimensional scaling and cluster analysis; MANOVA and discriminant analysis; decision trees; and support vector machines. Use of appropriate software.

Purpose: To introduce students with a variety of statistical backgrounds to the basic ideas in multivariate statistics. It will cover the assumptions, limitations, and uses of basic techniques such as cluster analysis, principal components analysis, and factor analysis as well as how to implement these methods in R. Instead of theoretical development, the focus will be on the intuitive understanding and applications of these methods to real data sets by the students.

Available Computing Resources

R is available as a free download (from the CRAN home page) and students who want SAS can buy a copy from USC Computer Services.
These packages are also available on the computers in the labs in LeConte College (and a few other buildings). Help in using R can be found on the CRAN home page.

Course Notes

Downloading Instructions for R

Computing Tips: Some Review

Computer Code for Class Examples

Example R Code

Example SAS Code

Data Sets

The data sets from the book may be found under "Data Files" at the textbook website:
http://biostatistics.iop.kcl.ac.uk/publications/everitt/.

Other data sets used in the course:

Homework

Some Homework Solution Code and General Comments

Midterm Exam Information