Biostatistics Seminar: Elizabeth Purdom
DATE: Tuesday, May 13th, 2014
TIME: 4:10pm (refreshments at 3:30pm, MSB 4110)
LOCATION: Mathematical Sciences Building 1147
SPEAKER: Elizabeth Purdom, Dept of Statistics, UC Berkeley
TITLE: Shrinkage of Dispersion Estimates for Analysis of Exon Usage from mRNA-Seq data
ABSTRACT:
The prevalence of sequencing experiments in genomics has led to an increased use of methods for count data in analyzing high-throughput genomic data to perform analyses. The importance of shrinkage methods in improving the performance of statistical methods remains. A common example is that of gene expression data, where the counts per gene are often modeled as some form of an over-dispersed Poisson. In this case, shrinkage estimates of the per-gene dispersion parameter have lead to improved estimation of dispersion in the case of a small number of samples. We address a different count setting introduced by the use of sequencing data: comparing differential proportional usage via an over-dispersed binomial model. Such a model can be useful for testing differential exon inclusion in mRNA-Seq experiments or differential allele frequencies in re-sequencing data. In this setting there are fewer such shrinkage methods for the dispersion parameter. We introduce a novel method that is developed by modeling the dispersion based on the double binomial distribution proposed by Efron (1986), also known as the exponential dispersion model (Jorgensen, 1987). Our method (WEB-Seq) is an empirical bayes strategy for producing a shrunken estimate of dispersion and effectively detects differential proportional usage, and has close ties to the weighted-likelihood strategy of edgeR developed for gene expression data (Robinson 2007, 2010). We analyze its behavior on simulated data sets as well as real data and show that our method is fast, powerful and controls FDR compared to alternative approaches.
This is joint work with Sean Ruddy.