[BioC] edgeR: effect of 'outlier' tags on differential expression calls
Gordon K Smyth
smyth at wehi.EDU.AU
Wed Apr 25 11:09:18 CEST 2012
Dear Alessandro,
You seem to giving examples of miRs that are expressed at a high degree is
just one sample. The easiest way to deal with such miRs, if you really
don't want to detect them, is to filter out miRs that fail to be expressed
to a reasonable degree in at least four samples (since your groups are of
size four). See for example pages 24-25 of the edgeR user's guide, where
this is done for the Dclk1 mouse case study. We often suggest cpm>1 for
at least m samples, where m is the minimum group size.
Another obvious thing to do is to examine an MDS plot to identify outlier
samples.
I also have to point out that the output in your email cannot be correct,
at least the output cannot all be from the same R session. The
sessionInfo() output says edgeR 2.6.0, but the column headings show that
the results being present are actually from an earlier version of edgeR.
I'd much rather see continuous code and output from one session, rather
than output snippets, without quite knowing how they were obtained.
Best wishes
Gordon
-------------- original message --------------
[BioC] edgeR: effect of 'outlier' tags on differential expression calls
alessandro.guffanti at genomnia.com alessandro.guffanti at genomnia.com
Tue Apr 24 12:48:22 CEST 2012
Dear colleagues: I am using edgeR to examine differential expression on
small RNA data
I noticed this problem also when working with SAGE datasets: when just one
of the samples is clearly an outlier, like you can see below for sample 7
(the comparison is 1-4 versus 5-8), there is a call of significant
differential expression which seems to be inappropriate, or at least it
should be reexamined.
How can we diagnose these situations before checking manually the tag
counts for all the significant differential expression calls ? Please note
that these are tumoral samples, so an high sample by sample variability is
expected in principle..
Thanks a lot in advance,
Alessandro
miRNA_ID 1.mirna 2.mirna 3.mirna 4.mirna
5.mirna 6.mirna
7.mirna 8.mirna
hsa-miR-515-3p 3 1 1 1 1 7 1601 3
hsa-miR-518e 4 0 1 0 1 2 1715 2
hsa-miR-520d-3p 0 0 0 0 0 1
243 0
hsa-miR-519c-3p 0 0 0 0 0 1
248 0
hsa-miR-520f 0 0 0 0 0 0 163 0
hsa-miR-519d 12 1 0 1 1 4 1754 1
hsa-miR-520h 0 0 0 0 0 0 189 2
hsa-miR-519c-5p 0 0 0 0 0 0
123 0
hsa-miR-520g 16 1 1 4 2 4 1917 2
hsa-miR-518b 5 0 0 1 1 3 686 1
hsa-miR-517a 100 5 4 2 6 45 10024 3
miRNA_ID logConc logFC P.Value adj.P.Val
hsa-miR-515-3p -15.09154 -8.61753 0.00000 0.00082
hsa-miR-518e -15.30278 -9.22926 0.00000 0.00110
hsa-miR-520d-3p -18.23592 -9.46747 0.00001
0.00201
hsa-miR-519c-3p -17.98705 -9.01722 0.00002
0.00338
hsa-miR-520f -32.04992 -35.93228 0.00002 0.00338
hsa-miR-519d -14.46073 -7.61177 0.00003 0.00338
hsa-miR-520h -18.02925 -8.34496 0.00003 0.00338
hsa-miR-519c-5p -32.25620 -35.51970 0.00004
0.00382
hsa-miR-520g -14.16219 -7.27220 0.00005 0.00382
hsa-miR-518b -15.70611 -7.39997 0.00006 0.00382
hsa-miR-517a -11.74423 -7.21374 0.00006 0.00382
R version 2.15.0 (2012-03-30)
Platform: x86_64-pc-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] edgeR_2.6.0 limma_3.12.0
--
Alessandro Guffanti - Head, Bioinformatics, Genomnia srl
Via Nerviano, 31 - 20020 Lainate, Milano, Italy
Ph: +39-0293305.702 Fax: +39-0293305.777
http://www.genomnia.com
"When you're curious, you find lots of interesting things to do."
(Walt Disney)
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list