[Bioc-sig-seq] Normalization in RNA-seq
Mayte Suarez-Farinas
farinam at mail.rockefeller.edu
Wed Nov 24 19:38:34 CET 2010
Dear All,
I have being working with pair samples for 3 subjects using edgeR
package
and I am puzzle with the results of my normalization. After
normalization, the data is skewed towards the LS group, and as a
result, I get much more genes up than down-regulated. We have study
this disease extensively in large samples with microarray and this is
not the case there, so now I am suspicious of my normalization.
I am including teh code and a pdf with the smear plot using the
normalization options in edgeR. On all of them the data looks worst
than after normalization.
If someone can look to what I did and point to any mistake, I will
really appreciate.
I dont know if the point is that I am deleting the unmapped reads
before normalization.
I was instructed as such in the SeqAnswer forum.
## Reading Files
files <- dir(pattern="*\\counts.txt$")
files.pheno<-data.frame(files=files, group=factor(substr(files,
1,2),levels=c("NL","LS")), Patient=factor(substr(files,3,4)))
PScounts <-readDGE(files.pheno)
colnames(PScounts)<-paste(PScounts$samples$group,PScounts$samples
$Patient,sep='-')
##delete unmmaped reads
unmmaped<-c('no_feature','ambiguous','not aligned','too low aQual')
PScounts<-PScounts[-which(rownames(PScounts$counts)%in%unmmaped),]
#Calculate Normalizations
d.PS <- calcNormFactors(PScounts)
pdf('Normalization Plots.pdf',height=10,width=10)
layout(matrix(1:4,2,2,byrow=TRUE))
a<-plotSmear(PScounts,
panel.first=grid(),smooth.scatter=FALSE,main='before normalization')
ma.plot(a$A,a$M,plot.method='add',cex=0)
b<-plotSmear(d.PS, panel.first=grid(),smooth.scatter=FALSE,main='after
TMM')
ma.plot(b$A,b$M,plot.method='add',cex=0)
rm(b)
d.PS.2 <- calcNormFactors(PScounts,method='RLE')
b<-plotSmear(d.PS, panel.first=grid(),smooth.scatter=FALSE,main='after
RLE')
ma.plot(b$A,b$M,plot.method='add',cex=0)
rm(b)
d.PS.3 <- calcNormFactors(PScounts,method='quantile')
b<-plotSmear(d.PS.3,
panel.first=grid(),smooth.scatter=FALSE,main='after quantile')
ma.plot(b$A,b$M,plot.method='add',cex=0)
rm(b)
dev.off()
> d.PS$sample ###(after TMM)
files group Patient lib.size norm.factors
LS-25 LS252.counts.txt LS 25 23067191 0.9085
LS-28 LS287.counts.txt LS 28 20684675 0.9056
LS-29 LS292.counts.txt LS 29 19881245 0.9965
NL-25 NL251.counts.txt NL 25 19665929 1.0129
NL-28 NL286.counts.txt NL 28 22938039 1.1554
NL-29 NL291.counts.txt NL 29 20541691 1.0422
>
> d.PS.2$sample ###after RLE
files group Patient lib.size norm.factors
LS-25 LS252.counts.txt LS 25 23067191 0.9495
LS-28 LS287.counts.txt LS 28 20684675 0.9898
LS-29 LS292.counts.txt LS 29 19881245 1.0385
NL-25 NL251.counts.txt NL 25 19665929 0.9592
NL-28 NL286.counts.txt NL 28 22938039 1.0572
NL-29 NL291.counts.txt NL 29 20541691 1.0104
> d.PS.3$sample ###after quantiles
files group Patient lib.size norm.factors
LS-25 LS252.counts.txt LS 25 23067191 0.8659
LS-28 LS287.counts.txt LS 28 20684675 0.9656
LS-29 LS292.counts.txt LS 29 19881245 1.1302
NL-25 NL251.counts.txt NL 25 19665929 0.8887
NL-28 NL286.counts.txt NL 28 22938039 1.0885
NL-29 NL291.counts.txt NL 29 20541691 1.0939
Mayte Suarez-Farinas
Research Associate, The Rockefeller University
Biostatistician, The Rockefeller University Hospital
1230 York Ave, Box 178,
New York, NY, 10065
+1(212) 327-8213
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://stat.ethz.ch/pipermail/bioc-sig-sequencing/attachments/20101124/5a1869c4/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Normalization Plots.pdf
Type: application/pdf
Size: 10856331 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/bioc-sig-sequencing/attachments/20101124/5a1869c4/attachment-0001.pdf>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://stat.ethz.ch/pipermail/bioc-sig-sequencing/attachments/20101124/5a1869c4/attachment-0003.html>
More information about the Bioc-sig-sequencing
mailing list