[Bioc-sig-seq] EdgeR questions in analyzing 454 data-about prior.n, TMM, and p_value
Mark Robinson
mrobinson at wehi.EDU.AU
Tue Oct 19 00:43:10 CEST 2010
Hi Ying.
Some comments below.
On 2010-10-18, at 10:22 PM, Ying Ye wrote:
> Dear edgeR users and developers,
>
> I have few questions about edgeR when recently I use it for 454
> pyrosequencing data:
>
> 1. prior.n
> According to users' manual, we may not use too low prior.n in
> moderated tagwise dispersion approach. But in my dataset, there are
> more than 15 samples in each comparison group and the freedom is
> larger than 30. prior.n <- estimateSmoothing(d) gives 0.0005329. So I
> am wondering if I could use 0.0005329 since I have rather big number
> of samples in each group. Or I should adjust prior.n into 10 according
> to the manual's suggestion.
Well, its hard to give a prescription for prior.n for all datasets. Since you have so many degrees of freedom, you shouldn't need prior.n as high as 10. You might try something lower, say 1-3.
> 2. TMM
> I am not sure if this is also applicable to 454 microbiota data.
> I suppose I should do TMM normalization as well since the
> normalization factors from my samples have a big variation (f is from
> 0.41 to 4.58). Is that right?
I must admit that I'm not intimately aware of all the nuances of microbiota data, but I will say that those factors you mention above are generally lower/higher than we see in RNA-seq data. I'd say its probably best to look at some "smear" plots -- through maPlot() for example -- to assess whether the TMM normalization is appropriately capturing shifts due to composition or the like.
As always for exploratory analysis, it would be good to look multidimension scaling plots -- see plotMDS.dge(). There is no substitute for looking at your data.
> 3. p_value
> According to your experience, is it reasonable and reliable to
> use p_value < 0.05 as significance criteria? or only <0.01 can be
> reliable.
First off, you'll probably want to do some multiple testing correction, which can be done through the topTags() function. As to where to set the threshold on significance, that is a matter of your false discovery tolerance ... the status quo is 5%, but you may want to be more or less stringent.
Hope that helps.
Mark
> I am a new users in this package and hope you may give some
> suggestions. Many thanks!
>
> Ying Ye
>
>
>
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
------------------------------
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robinson at garvan.org.au
e: mrobinson at wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
------------------------------
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:6}}
More information about the Bioc-sig-sequencing
mailing list