[BioC] Up & Downregulated genes using DESeq
Simon Anders
anders at embl.de
Tue Mar 27 16:06:54 CEST 2012
Hi
On 03/27/2012 03:43 PM, Sunny Yu Liu wrote:
> Because no matter how fancy the statistics is, we have to find something
> having biological significance. For example, in most time, even have
> p<0.000000000001, still get no biological effects.
> For most gene expression change, people always use fold change 2 as a
> cutoff for microarray or qPCR. As for RNAseq, since the method is much
> more sensitive, I guess it must lose some specificity, so I think it may
> need a higher cutoff number than 2.
I guess, this needs some clarification, before we confuse newcomers to
RNA-Seq too much,
A cut-off of 2 on a log2 scale means a fold change of at least
four-fold. This is a lot, and there is plenty of cases where a weaker
signal is biological meaningful. Depending on the experimental setup,
fold changes of +/-20% (.26 on a log2 scale) or even much less can be
informative.
In most RNA-Seq experiments, a p value cut-off to control false discover
rate at some sensible value, say 5% or 10%, will not allow any genes
with a log2 fold change below a certain value to be called significant,
and in my experience, this value nearly always make further fold-change
cut-offs unnecessary.
This belief that fold-change cut-offs are important may stem from the
proliferation of incorrect analysis methods in the RNA-Seq literature.
In very many papers, analysis methods based on Fisher's test or on a
likelihood ratio test based on a Poisson distribution are used. In fact,
there are even several reviews which suggest such approaches.
Apart from the fact that these tests simply inadmissible (see e.g.
Baggerly et al., Bioinformatics 19 (2003) 1477), they cause a peculiar
pattern: They give all genes with expression strength above a few tens
of thousand reads absurdly low p values even in case of very weak fold
changes, so that nearly all strongly expressed genes are significant.
I've seen in several posts on this and other mailing lists the advice to
use a very small p value threshold (adjusted p value < .001 or the
like), combined with a fold change cut-off, to rectify the situation.
With a correct test, it is rare that you have significant genes with so
weak fold change that you have to doubt their biological relevance.
I see, however, at least one case where a fold-change cut-off is useful:
It is unavoidable that the decision boundary that separated significant
from non-significant fold-changes goes down with increasing count
values. Hence, any list of "hits" will be enriched for strong genes. If
this is problematic for downstream analysis, one may opt, as a crude but
workable remedy, to decide on a fold change cut-off and consider all
genes below this as not significant _and_ omit from the universe of all
enrichment tests all genes with a count value below the count required
for this fold change to becone significant (i.e., considering these weak
genes as essentially "not testable" rather than "not significant").
Simon
More information about the Bioconductor
mailing list