[Bioc-sig-seq] question about replace

Martin Morgan mtmorgan at fhcrc.org
Thu Sep 8 04:28:23 CEST 2011


On 09/07/2011 11:34 AM, wang peter wrote:
> i replaced the low quality position with "N"
> and then trimmed N from 5' and 3'
> i got the reads which only contain the "N" in the middle
> if i want to find those reads before replacement by "N"
> WHAT CAN I DO?

Hi Peter --

Not sure what you mean, probably not this

library(ShortRead)
example(readFastq)  ## defines 'rfq' variable
startsWithN <- as.character(narrow(sread(rfq), 1, 1)) == "N"
endsWithN <-
     as.character(narrow(sread(rfq), width(rfq), width(rfq))) == "N"
middleN <-
     !startsWithN & !endsWithN & alphabetFrequency(sread(rfq))[,"N"] != 0
rfq[!middleN]

I think below your code can be simplified with

   at <- as(quality(read), "matrix") < qualityCutoff

Martin

>
> THX
> such is coding
>
> library(ShortRead)
> reads<- readFastq(fastqfile);#
> ids<- id(reads); #
> seqs<- sread(reads);
>
> nCount<-alphabetFrequency(seqs)[,"N"]
> nDist<- table(nCount)
>
> qualityCutoff<- 20
> qual<- PhredQuality(quality(quality(reads)))
> myqual_mat<- matrix(charToRaw(as.character(unlist(qual))),
> nrow=length(qual), byrow=TRUE) # convert quality score to matrix
> at<- myqual_mat<
> charToRaw(as.character(PhredQuality(as.integer(qualityCutoff))))
> letter_subject<- DNAString(paste(rep.int("N", width(seqs)[1]),
> collapse=""))
> letter<- as(Views(letter_subject, start=1, end=rowSums(at)),
> "DNAStringSet")
> injectedseqs<- replaceLetterAt(seqs, at, letter)
>
> trimmed<-trimLRPatterns(Rpattern = letter_subject, Lpattern =
> letter_subject,subject = injectedseqs)
> nCount<-alphabetFrequency(trimmed)[,"N"]
> nDist<- table(nCount)
>
> thank u
>
> shan gao
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioc-sig-sequencing mailing list