[Bioc-sig-seq] ReadFastq error
Hervé Pagès
hpages at fhcrc.org
Fri Feb 19 23:39:47 CET 2010
Hi Ramzi,
One thing you can try is loading your fastq file with:
library(Biostrings)
bset <- read.BStringSet("path/to/your/file", format="fastq")
Note the use of read.BStringSet() instead of read.DNAStringSet().
Since BString/BStringSet objects are not limited to the DNA alphabet
(see ?DNA_ALPHABET), you should be able to load your file even if
it contains non-DNA letters (unless it has other problems of course).
Then you can do something like:
ndnaletter_per_string <-
vcountPDict(BStringSet(DNA_ALPHABET), bset, collapse=2)
which(ndnaletter_per_string != width(bset))
to extract the list of fastq records (as an integer vector) that
contain at least 1 non-DNA letter. (Note that the code above works
only with R-devel + BioC-devel.)
That way you'll be able to know if you have records like this and
where they are.
readFastq() won't load a fastq file with non-DNA letters in it.
Cheers,
H.
Ramzi TEMANNI wrote:
> Hi,
> I'm encountering the following error when trying to load fastq file:
>
> Error in .local(dirPath, pattern, ...) :
> _DNAencode(): key 73 not in lookup table
>
> Key 73 in ascii table is "I" (capital i)
>
> Anyone had encountered such error before ?
>
> Thanks in advance for your help
>
> Regards,
> Ramzi
>
>> sessionInfo()
> R version 2.10.1 (2009-12-14)
> x86_64-pc-linux-gnu
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] biomaRt_2.2.0 ShortRead_1.4.0 lattice_0.18-3
> BSgenome_1.14.2
> [5] Biostrings_2.14.12 IRanges_1.4.11
>
> loaded via a namespace (and not attached):
> [1] Biobase_2.6.1 grid_2.10.1 hwriter_1.1 RCurl_1.3-1 XML_2.6-0
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioc-sig-sequencing
mailing list