[BioC] Checking RNAseq DESeq pipeline
nathalie
nac at sanger.ac.uk
Tue Mar 6 16:54:06 CET 2012
Hi all,
I have done an differential expression analysis using DESeq on a set of
data and got a lot of differential expressed genes at the end. I would
like people to check the code and the assumptions I made.
A-Samples
I am analysing 6 samples in total from 3 different cell types (E, S and H),
S1and S2 ( biological replicates) H1, H2 and H3 ( biological
replicate) E (no replicate)
We took from one animal cells called S ( sample S1) we passaged several
time these cells and isolated samples we called S2 ( sample S2)
from the cell population S ,we took subclones which we cultivated with
factors to obtain 3 independant cell lines of the same cell type which
were derived independently from the S population, called H1 H2 and H3
(biological replicates).
AT last, another cell line was derived from clone H1 which we call E1
(only sample).
B-standard RNA seq was performed in these 6 samples
AFter sequencing, we aligned the samples using topHat and count reads
with HTSEQ python script.
Duplicates haven't been removed in this process. I plan to do one
analysis with removed duplicates.
C-then once in DESeq I have created my count table and followed the
vignette-I have test the diff Exp of condition F vs H
> conds
[1] E F F H H H
Levels: E F H
> head(countsTable)
E F F.1 H H.1 H.2
1 789 195 235 1019 938 1048
2 1 0 0 11 17 31
3 543 325 460 818 883 644
4 103 89 103 114 87 96
5 281 41 102 328 245 387
6 1 2 1 9 8 7
> cds <- newCountDataSet( countsTable, conds )
> sizeFactors(cds)
E F F.1 H H.1 H.2
1.0417242 0.6740896 0.8994357 1.2268668 1.1499580 1.1831515
> cds<-estimateDispersions(cds)
> str( fitInfo(cds) )
List of 5
$ perGeneDispEsts: num [1:54665] 0.000954 0.592346 0.023319 0.000245
0.055592 ...
$ dispFunc :function (q)
..- attr(*, "coefficients")= Named num [1:2] 0.0581 1.8325
.. ..- attr(*, "names")= chr [1:2] "asymptDisp" "extraPois"
..- attr(*, "fitType")= chr "parametric"
$ fittedDispEsts : num [1:54665] 0.061 0.2741 0.0612 0.0766 0.0669 ...
$ df : int 3
$ sharingMode : chr "maximum"
>head(fData(cds))
disp_pooled
1 0.06096110
2 0.59234620
3 0.06124489
4 0.07657296
5 0.06688364
> str( fitInfo(cds) )
List of 5
$ perGeneDispEsts: num [1:54665] 0.000954 0.592346 0.023319 0.000245
0.055592 ...
$ dispFunc :function (q)
..- attr(*, "coefficients")= Named num [1:2] 0.0581 1.8325
.. ..- attr(*, "names")= chr [1:2] "asymptDisp" "extraPois"
..- attr(*, "fitType")= chr "parametric"
$ fittedDispEsts : num [1:54665] 0.061 0.2741 0.0612 0.0766 0.0669 ...
$ df : int 3
$ sharingMode : chr "maximum"
> res <- nbinomTest( cds, "H", "F" )
> head(res)
id baseMean baseMeanA baseMeanB foldChange log2FoldChange
pval
1 1 616.515373 844.007629 275.276988 0.3261546 -1.6163720 6.817493e-06
2 2 9.990057 16.650096 0.000000 0.0000000 -Inf 2.662914e-03
3 3 594.493135 659.634053 496.781759 0.7531172 -0.4090537 2.522510e-01
4 4 99.251992 83.237929 123.273085 1.4809725 0.5665449 1.666241e-01
5 5 196.343734 269.163821 87.113605 0.3236453 -1.6275146 6.472665e-05
6 6 4.857542 6.736313 2.039386 0.3027452 -1.7238239 2.449359e-01
padj
1 4.775233e-05
2 1.088826e-02
3 5.047015e-01
4 3.642458e-01
5 3.807362e-04
6 4.934803e-01
plotDE <- function( res )
plot(
res$baseMean,
res$log2FoldChange,
log="x", pch=20, cex=.3,
col = ifelse( res$padj < .1, "red", "black" ) )
I have attached As pdf the plotDE( res ) , hist(res$pval, breaks=100,
col="skyblue", border="slateblue", main="") and plotDispEsts(cds) for a
basic idea of the data.
I have ~10k genes which show a v low pvalue for DExpression which seem
a lot.
I have several questions,
1-Concerning the experimental design I have described in A , can the
samples be considered as real bio replicate?
2-Would it be "Better" to get rid of duplicated reads prior to HTSeq
3-How can I be more stringent on the expression?
many thanks
Nat
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.pdf
Type: application/pdf
Size: 307013 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20120306/ffd12aef/attachment.pdf>
More information about the Bioconductor
mailing list