[Bioc-sig-seq] A modified readFASTA for reading 'quality fasta' files.
Dan Bolser
dan.bolser at gmail.com
Tue Aug 26 13:49:08 CEST 2008
Hi all,
I have been using phred/phrap to do some sequence assembly, and I
thought that it might be useful to plot the 'quality scores' that are
output by phred.
In case you are interested, I have data from the ABI_3730, big-dye
terminator (DT3730POP7{BDv3}.mob).
I have adapted the readFASTA function in the Biostrings library to
read in the 'quality fasta' files produced by phred. Here is a diff of
the two functions so you can see what I did on the off chance that it
is useful for somebody else:
*** readFASTA.R
--- readQualityFASTA.R
***************
*** 1,5 ****
! readFASTA <-
function (file, checkComments = TRUE, strip.desc = TRUE)
{
if (missing(strip.desc))
--- 1,5 ----
! readQualityFASTA <-
function (file, checkComments = TRUE, strip.desc = TRUE)
{
if (missing(strip.desc))
***************
*** 37,43 ****
desc <- s1[descriptions[i]]
if (strip.desc)
desc <- substr(desc, 2L, nchar(desc))
! seq <- paste(s1[dp[i]:end[i]], collapse = "")
list(desc = desc, seq = seq)
})
}
--- 37,44 ----
desc <- s1[descriptions[i]]
if (strip.desc)
desc <- substr(desc, 2L, nchar(desc))
! seq <- paste(s1[dp[i]:end[i]], collapse = " ")
! seq <- as.numeric(strsplit(seq, " ", perl=TRUE)[[1]])
list(desc = desc, seq = seq)
})
}
I am relatively new to BioC (and sequencing in general) so I am not
sure if there isn't a better way to do this... please don't hesitate
to point me at the relevant docs! In particular, where can I read
about the sequence assembly capabilities of R?
All the best,
Dan.
----
http://network.nature.com/profile/dan
More information about the Bioc-sig-sequencing
mailing list