A Bayesian approach to identify genes with multiple expression patterns for paired RNA-seq data
Jing Qiu
University of Delaware
A Bayesian approach to identify genes with multiple expression patterns for paired RNA-seq data
It is often of research interest to identify genes with specific expression patterns over several conditions such as time points, genotype, etc. The common practice is to perform differential expression analysis separately for each condition and then combine the results to obtain a list of genes with desired expression pattern or profiles. Such practice can inflate the type I error for identifying genes with different expression patterns under multiple conditions, especially when the desired expression pattern involves equally expression under certain conditions. In this paper, we propose a Bayesian approach to identify genes with multiple expression patterns under two conditions with FDR controlled for all desired expression patterns simultaneously. We develop a hierarchical Bayesian partition model for the paired RNA-seq data that jointly models gene expression levels under two conditions with multiple expression patterns. The use of the inverse moment non-local prior is used in modeling expression patterns with equal expression under one condition. Our simulation studies show that it is a much more challenging job to identify genes that are equally expressed in one condition but differentially expressed in the other condition than identify genes that are differentially expressed in both conditions. The common practice in literature can have highly inflated type I error for identifying the former type of genes. Our method has FDR controlled close to the nominal level for almost all cases considered regardless of the expression pattern and can have better power in identifying genes that differentially expressed in both conditions than the popular methods in literatures for larger sample sizes or when the proportion of such genes is large. We further apply our model to identify different gene expression patterns under two genotypes from parental or progeny lines of a soybean dataset.
Refreshments at 3:45pm in Snedecor 2101.