[Bioc-sig-seq] ReadFastq error
Hervé Pagès
hpages at fhcrc.org
Sat Feb 20 01:36:25 CET 2010
Ramzi,
In case you have trouble or don't want to install R-devel + Bioc-devel,
here is code that should work with release and devel (my sessionInfo
at the end):
library(Biostrings)
bset <- read.BStringSet("path/to/your/file", format="fastq")
dnaletter_cols <- as.integer(
BString(paste(DNA_ALPHABET, collapse=""))) + 1L
ndnaletter_per_string <-
rowSums(alphabetFrequency(bset)[ , dnaletter_cols])
which(ndnaletter_per_string != width(bset))
Cheers,
H.
> sessionInfo()
R version 2.10.1 (2009-12-14)
x86_64-unknown-linux-gnu
locale:
[1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8
[5] LC_MONETARY=C LC_MESSAGES=en_CA.UTF-8
[7] LC_PAPER=en_CA.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] Biostrings_2.14.12 IRanges_1.4.11
loaded via a namespace (and not attached):
[1] Biobase_2.6.1 tools_2.10.1
Hervé Pagès wrote:
> Hi Ramzi,
>
> One thing you can try is loading your fastq file with:
>
> library(Biostrings)
> bset <- read.BStringSet("path/to/your/file", format="fastq")
>
> Note the use of read.BStringSet() instead of read.DNAStringSet().
>
> Since BString/BStringSet objects are not limited to the DNA alphabet
> (see ?DNA_ALPHABET), you should be able to load your file even if
> it contains non-DNA letters (unless it has other problems of course).
>
> Then you can do something like:
>
> ndnaletter_per_string <-
> vcountPDict(BStringSet(DNA_ALPHABET), bset, collapse=2)
> which(ndnaletter_per_string != width(bset))
>
> to extract the list of fastq records (as an integer vector) that
> contain at least 1 non-DNA letter. (Note that the code above works
> only with R-devel + BioC-devel.)
>
> That way you'll be able to know if you have records like this and
> where they are.
>
> readFastq() won't load a fastq file with non-DNA letters in it.
>
> Cheers,
> H.
>
>
> Ramzi TEMANNI wrote:
>> Hi,
>> I'm encountering the following error when trying to load fastq file:
>>
>> Error in .local(dirPath, pattern, ...) :
>> _DNAencode(): key 73 not in lookup table
>>
>> Key 73 in ascii table is "I" (capital i)
>>
>> Anyone had encountered such error before ?
>>
>> Thanks in advance for your help
>>
>> Regards,
>> Ramzi
>>
>>> sessionInfo()
>> R version 2.10.1 (2009-12-14)
>> x86_64-pc-linux-gnu
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>>
>> other attached packages:
>> [1] biomaRt_2.2.0 ShortRead_1.4.0 lattice_0.18-3
>> BSgenome_1.14.2
>> [5] Biostrings_2.14.12 IRanges_1.4.11
>>
>> loaded via a namespace (and not attached):
>> [1] Biobase_2.6.1 grid_2.10.1 hwriter_1.1 RCurl_1.3-1 XML_2.6-0
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioc-sig-sequencing
mailing list