My research interests focus on statistical modeling and
applications for analyzing -omics data to improve human health.
With my joint appointment, the second focus of my research
activities is to conduct collaborative activity (such as serving
as co-I on grant proposals, collaborative publications).
My recent methodological research focuses on developing novel
approaches for differential co-expression analysis (DC). DC
analysis examines whether there are correlated changes of
expression between a set of genes under various biological
conditions. This coordinated expression change suggests evidence
for possible co-regulation related to the biological condition in
question. Our recent project aims to develop novel analytical
tools to detect DC gene combinations using data generated by
single-cell RNAsequencing (scRNAseq). Our work in this area is
still ongoing. My research team recently published three papers
related to this topic listed below. And an R21 project funded by
[NIH/National Cancer Institute (NCI)] in this research direction.
In addition, integrative multi-omics data analysis has been
increasingly popular with the recent advances in technologies.
Because each data type usually has a distinct marginal
distribution, a joint study of correlation presents a statistical
challenge. We introduced a flexible copula-based framework to
study correlation changes across different data types.
1. Modeling liquid
association using gene expression data generated from microarray
experiments
Biometrics
paper (2011) Software
package
2. Flexible bivariate
correlated count data regression for bulk RNA-seq data
Statistics
in Medicine paper (2020)
3. Modeling dynamic
correlation in zero-inflated bivariate count data with
applications to single-cell RNA sequencing data.
Biometrics
paper (2021)
4. Ma Z., Davis S.W., and Ho Y.-Y. (2022). Flexible copula
model for integrating correlated multi-omics data from
single-cell experiments.
Biometrics
paper (To appear)
5. nPARS: A comprehensive search algorithm for
constructing Bayesian networks using large-scale genomic
data
Data: SNP, Gene
Expression, and Cytotoxicity
Outcomes
(
ldocauc for the area under the log-dose response curve of docetaxel and
lfuauc for the area under the log-dose response curve of 5-FU)
Gene expression amd cytotoxicity outcomes are
standardized ([x-mean(x)/sd(x)]).
Algorithm: README, R source Code.
6. Using gene expression to improve the power of genome-wide
association analysis
R Source
code ReadMe