STAT718/BIO703

Spring 2026


Genomic Data Science: AI & Bioinformatics

Instructor: Yen-Yi Ho

Office: LeConte 216A

Class Meetings:

Monday/Wednesday 14:20-15:35PM

Classroom: LeConte 206


Dr. Ho's office Hours: Thursday afternoon 16:00-17:00PM, Friday 10-11:30AM or by appointment (LeConte 216A)

Email: hoyen@stat.sc.edu

Course Website: https://people.stat.sc.edu/hoyen/STAT718/STAT718.html

Textbook:

1. Introduction to Data Science: Data Wrangling and Visualization with R by Rafael A. Irizarry   (Required)

2. Python Data Science Handbook by Jake VanderPlas

available at https://jakevdp.github.io/PythonDataScienceHandbook/

3. Python for Biologists Tutorial. Available at https://www.pythonforbiologists.org/

4. Machine Learning for Biology Tutorial. Available at https://pythonforbiologists.com/

5. Deep Learning by Ian Goodfellow, Yoshua Bengio and Aaron Courville. Available at https://github.com/janishar/mit-deep-learning-book-pdf or https://www.deeplearningbook.org

6. Understanding deep learning by Simon J.D. Prince 2023, the MIT press. Available at https://udlbook.github.io/udlbook/

Recommended

1. Python Crash Course, 3rd Edition: A Hands-On, Project-Based Introduction to Programming by Eric Mattens.

2. Deep Learning by PyTorch: Build, Train, and Tune Neural Networks Using Python Tools by Eli Stevens, Luca Antiga and Thomas Viehmann


Announcements:

Approximate course outline: (Lecture notes will be updated often)

Date Weekly topic
Homework
Code
  Reading         
Week 1
Jan 12
Syllabus


Lecture 1: Introduction to Genomic Data


Getting Started with Jupyter Notebook and JupyterLab


R Markdown (Chapter 20.2 in Rafa)




Homework Submission Instruction

Homework1

Homework 1
Notebook



Homework 1 Solution

Google Colab Coupon


Link for Requesting
HPC account

(Choose Research Computing Account creation)


Week 2
Jan 19

Jan 19: No Class

Python Basic 1
  Data Types
   NumPy


Presentation Schedule

Paper presented by Macro

Paper presented by John Darden

Paper presented by Kasra

Paper presented by Jianyu

Paper1 presented by Grant
Paper2 presented by Grant

Cellpose paper presented by Xiuchuan

CGMega paper presented by Meysam

Cox-PASNet paper presented by Elaina

Nucleotide Transformer paper presented by Anderson



Python Basic 1
Notebook1

Python Basic 2
Notebook2


Python Data Science Handbook Chap 2 &3
Week 3
Jan 26

Plots in Python

Python Functions
 
If and Loops

Modules and Packages






Python Basic 3
Notebook3

                                       
Python Basic 4
Notebook4

Python Data Science Handbook Chap 4
Week 4
Feb 2

 
  
  Data Manipulation with Pandas

   Biopython Tutorial



Homework1Due
(Monday Feb 02)

Homework2

Homework 2
Notebook


Homework 2 Solution


Paper about Chopsticks Gene


df3.csv

adata1_counts.csv.gz
adata2_counts.csv.gz

adata1_cell_metadata.csv
adata2_cell_metadata.csv
gene_metadata.csv

Python Basic 5
Notebook5


myseq.fa
Biopython Case Studies

Biopython Case Studies Notebook


Deep Learning by Prince Chap 1, 2, 3



Week 5
Feb 9

   
    
     Lecture 7: Machine Learning Part I (KNN)
    
     Lecture 8: Machine Learning Part II (Linear Classifier)

      Lecture 9: Regularization, Optimization and Performance Metric
      
    The Good, the Bad and the Ugly


 

Homework 3

Homework 3 Notebook


TCGA Pancancer Expression Data

TCGA Pancancer Meta Data



KNN Classifier

Notebook

Linear Classifier

Linear Notebook


Deep Learning by Goodfellow et al. Chap 5

Deep Learning by Prince Chap 4, 5, 6
Week 6
Feb 16
 
  
  Lecture 10: Neural Networks & Backpropagation

   
 Lecture 11: Convolutional Neuron Networks
  

 



Homework2Due
(Monday Feb 16)






Identifying hand-written digits using PyTorch

Notebook

Deep Learning by Goodfellow et al. Chap 6 & 7

Deep Learning by Prince Chap 7, 8, 9
Week 7
Feb 23



Lecture 11: DNA Convolutional Neuron Networks and Applications in Regulatory

Genomics: DeepBind

DNA methylation: DeepCpG








Genomics_CNN
Genomics_CNN.ipynb







Deep Learning by Goodfellow et al. Chap 8 & 9


Deep Learning by Prince Chap 10
Week 8
March 2


      
 
 

 
  

     


Homework3 Due
(Monday March 02)


Homework 3 solution

Notebook






Deep Learning by Goodfellow et al. Chap 10 & 11

Deep Learning by Prince Chap 11

Week 9
March 9
  Spring Break: No Classes





Week 10
March 16

   

Lecture 12: CNN for Gene Coexpression (CNNC) in Single-Cell Data




Homework 4

Homework 4 Notebook






Final Project Proposal template




Deep Learning by Goodfellow et al. Chap 14

Deep Learning by Prince Chap 12 & 13
Week 11
March 23




From Language Models to Cell Types: Transformers in Genomic Data Science
     

Lecture 13: Transformer and Attention
  






Final Project Instruction




HPC tutorial

Linux Commands

Linux File Transfer






Deep Learning by Prince Chap 14 & 15
Week 12
March 30



Single-Cell Best Practice Book


Homework 4 Due
(Monday March 30)

Final Project Proposal Due
(Monday March 30)

  scBERT Slides

  Transformer Encoder Example


  Transformer Encoder Notebook



Deep Learning by Prince Chap 16 & 17
Week 13
April 6

   

    Graphical Neural Networks


Final Project Template

Final Project Template Notebook




Karate Club Example

Karate Club Example Notebook


Alphafold simplified example

Alphafold simplified example notebook



Deep Learning by Prince Chap 18 & 19
Week 14
April 13

  

    Graphical Neural Networks

  Homework 4 Solution

Homework 4 Solution Notebook



Convolutional Layer

Convolutional Layer Notebook

Deep Learning by Prince Chap 20 & 21

Week 15
April 20

    Student Presentation
Final Project Presentation

Presentation Rubric
 
 



Week 16
April 27

    Student Presentation
Final Project Due
Monday May 4 before 5PM