Stat 770, Spring 2017

STAT 770: Categorical Data Analysis, Spring 2017


Instructor: Tim Hanson. E-mail: hansont@stat.sc.edu.
Office Hours: Tuesday/Thursday 10am to 11am, and by appointment. I am also available by Skype.
Office: 219C LeConte College, (803) 777-3859.
Class Meeting: Tuesday/Thursday 11:40am - 12:55pm in WMBB Nursing room 409 or online.
Textbook: Categorical Data Analysis, Third Edition by Alan Agresti.
Video link for STAT 770.
Prerequisites: STAT 704 and consent of instructor, or BIOS 759.

Course description

We will cover a good portion of Categorical Data Analysis (3rd Edition) by Alan Agresti. If time permits, we will cover additional topics including generalized additive mixed models and Bayesian approaches. Timetable:

One week: categorical data; inference for binomial and multinomial proportions; Wald, score, and likelihood ratio tests.
One week: contigency tables: two proportions, stratified 2 by 2 tables, measuring association.
One week: inference for I by J tables: confidence intervals, testing, chi-squared tests, ordinal outcomes, small-sample inference.
One week: generalized linear models: binary and count regression, quasi-likelihood.
One week: logistic regression I: interpretation, inference, multiple predictors, fitting.
One week: logistic regression II: model selection, diagnostics, Mantel-Haenszel statistic, quasi and complete separation, sample size & power.
One week: logistic regression III: other links, Bayesian inference, conditional logistic regression, generalized additive models, ROC curves & discrimination.
One week: multinomial regression: baseline-category logits, ordinal regression, discrete choice models.
One week: matched pairs: repeated measures on a proportion, conditional logistic regression, symmetry, dependency measures.
One week: repeated measures: generalized estimating equations, Markov transitions.
One week: repeated measures: generalized linear mixed models
One week: log linear models I: two-way tables, inference, and fitting.
One week: log-linear models II: multi-way tables, model building and interpretation, conditional independence graphs and collapsibility.
One week: topics in machine learning: logistic regression, boosting, and support vector machines.

Learning outcomes

Learning Outcomes: by the end of the course students should be able to:
► identify designs of contingency tables and recommend appropriate measures of association and statistical tests;
► develop models for binary, polytomous and multivariate categorical responses, interpret results regardless of model parameterization, and diagnose model fits;
► interpret and communicate categorical data methods to a technical audience;
► develop log-linear models for multi-way contingency tables and interpret the conditional independence structure through the use of association graphs; and
► analyze dependent categorical data models using both classical approaches and mixed effects models.

Expectations

All students are expected to:
► Attend or view all class sessions. Live class attendance, for those who can do so, is highly appreciated - this includes distance students calling in with live questions during the lecture. I understand that this is not possible for some of you. You are encouraged to use a computer during class to "play along" as we go.
► Review lecture required reading and/or notes before class. Handouts (if any) and course notes will be posted on the course webpage the day before each class.
► Attempt all of the assigned homework problems and email them to the TA by noon on the due date. Start homework SOON after it is assigned; this is especially true in a class involving computing. Do not email me about the assignment the night before it is due.

Textbook and reading

(1) Agresti, A. (2013). Categorical Data Analysis (3rd edition), Wiley. (2) Course notes. Sample SAS code for fitting models, as well as many of the data sets used in the book can be found at Alan Agresti's website. There are also answers to many of the odd numbered problems here.

Lectures

You may (a) attend the live class in person, or (b) watch it live via web-streaming from any remote site, with call-in or instant-message capability for real-time questions, and/or (c) watch the recorded class later with rewind, pause, and fast-forward tools. Recordings will be posted within 24 hours of the class as files which can be viewed anytime. The ID is statistics (lowercase) and the password is ARTS#2016 (uppercase, no space). You can watch live at home during class time through web conferencing via Adobe Connect.

Computing

Computing: SAS is the computing environment used in this course. SAS OnDemand for Academics is a free web-based version of SAS for those enrolled in STAT 770. If you want to use web-based SAS from your browser, first register; here are detailed registration instructions. After you have registered, enroll in STAT 770 .

Log in to SAS OnDemand for Academics. Under "Applications" click on "SAS Studio" to get started.

Accommodations for disabilities

If you require special accommodations for a disability, these must be arranged in advance through the Office of Student Disability Services in room 112A LeConte (777-6142, TDD 777-6744, sasds@mailbox.sc.edu).

Homework

Homework will be posted on the class website at least one week in advance of the due date; there will be about 8 to 10 homeworks. Send your homework solutions by e-mail as a single file in MS Word or PDF format to the Teaching Assistant Yawei Liang (yliang@email.sc.edu) with "STAT 770 Homework" in the subject line (do not cc this email to me) by noon on the due date. Use only one side of each page, put your name on every page, and use page numbers. Any handwriting on papers must be clearly legible on the received paper after scanning - do not use soft pencil.

Grading

The minimum percent needed for each grade is: A 90% B+ 87% B 80% C+ 77% C 70% D 60%. Those with a final course percentage under 60% receive an F.

Honor Code

The official honor code is the Carolinian Creed in the Carolina Community: Student Handbook & Policy Guide. If you violate the honor code, I am required to report the case to the University's academic integrity office. If you are "found responsible" in the ensuing deliberations, the penalty will be at least a letter grade in the course. Examples of honor code violations include but are not limited to: copying, or allowing someone else to copy, solutions to assignments; posing as another student to do assignments or exams; hiring or persuading someone else to do assignments in your place, etc. The whole point of this is to learn! Do not treat the course as an "obstacle" to overcome; treat it as an opportunity to develop new, powerful tools for analyzing categorical data and deepening your understanding of statistics.

Some additional comments

► Working together on homework problems is permitted and encouraged, but each student should write up his/her solutions independently of others (this will help you develop understanding). I will make available a voluntary email list of course participants so that you may contact each other regarding homework, etc. This is strictly voluntary; I will announce this during the first lecture.
► The distance aspect of this course affords you the flexibility to attend class at your own convenience. However, I have found that students who are not disciplined will fall behind in attendance. Stay current with the lectures! Attending live (in class) or streaming is preferred. If you watch lecture later, it helps to have a set schedule, e.g. watch at the same times each week. Re-watching portions of a lecture can help with difficult concepts. If you have questions, attend office hours, make an appointment to meet me, or email me. I can also make a Skype appointment if you live far away from campus.