[BioC] how edgeR control outliers?
Gordon K Smyth
smyth at wehi.EDU.AU
Thu Mar 1 23:50:10 CET 2012
Dear Yuan,
The deviance is a standard quantity in generalized linear model theory,
analogous to the residual sum of squares in ANOVA. It is usually treated
as chisquare distributed, although this approximation can be rough in some
cases. See for example:
http://en.wikipedia.org/wiki/Deviance_(statistics)
Yes, when I said to test for outliers using the gof() function in
https://stat.ethz.ch/pipermail/bioconductor/2012-January/043187.html
I meant that outliers are those with large gof statistics. The
calculation of p-values to test for outliers is already done for you by
the gof() function.
Figure 2 of the following article provides some plots of gof() statistics:
http://nar.oxfordjournals.org/content/early/2012/01/28/nar.gks042
The plots are made by
g <- gof(fit)
z <- zscoreGamma(g$gof.statistics,shape=gof$df/2,scale=2)
qqnorm(z)
Another very useful diagnostic is to plot the tagwise dispersion against
abundance. Outliers may appear as large dispersions. In the
developmental version of edgeR, there is a function plotBCV() provided to
do this.
Best wishes
Gordon
> Date: Wed, 29 Feb 2012 20:09:06 -0800
> From: Yuan Tian <ytianidyll at ucla.edu>
> To: Bioconductor mailing list <bioconductor at r-project.org>
> Subject: [BioC] how edgeR control outliers?
>
> Dear all,
>
> I'm currently using edgeR to detect the differentially expressed genes
> from a RNAseq datasets, and I'm also using the gof() function to test
> for potential outliers. I have two questions regarding the outlier
> detection, and would like to have your suggestions.
>
> 1) How the outlier is defined? Is it the gene that have a deviance
> larger than a threshold? How is the deviance contained in the glmfit
> data calculated?
>
> 2) In gof() function, it assumes the deviance should follow a
> chi-squared distribution. But what is the statistic basis for this
> assumption?
>
> Thanks!
>
> Yuan
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list