[Bioc-sig-seq] ShortRead internal: too many 'snap' entries
Martin Morgan
mtmorgan at fhcrc.org
Mon Apr 5 00:46:45 CEST 2010
On 04/04/2010 02:07 PM, Yanwei Tan wrote:
> Dear Martin,
>
> I use nFilter to filter out the sequences which contain any "N",
> following is my codes:
>
>> # read the fastq file
>> fq<-readFastq("/Users/wei/Desktop/Originaldata",pattern="Bic.txt")
>> # filter for N containing reads
>> filt<-nFilter()
>> fq<-fq[filt(fq)]
>> # write the out
>> writeFastq(fq,file="/Users/wei/Desktop/Originaldata/bicfiltered.txt")
>
>
> After I got the filtered fastq file:
>
>>readFastq("/Users/wei/Desktop/Originaldata", "bicfiltered.txt")
> Error in .local(dirPath, pattern,...) :
> ShortRead internal: too many 'snap' entries
Execute these commands
library(ShortRead)
example(readFastq)
Then please copy and paste the results of the following commands
f = tempfile()
writeFastq(rfq, f)
readFastq(f)
If your results look like mine:
> f = tempfile()
> writeFastq(rfq, f)
> readFastq(f)
class: ShortReadQ
length: 256 reads; width: 36 cycles
then please report the output of
list.files("/Users/wei/Desktop/Originaldata", "bicfiltered.txt")
In your commands, after fq[filt(fq)], please report the output of
fq
Please confirm that you do not manipulate the file produced by
writeFastq() before trying to readFastq().
Martin
>
> My sessioninfo():
> R version 2.10.1 (2009-12-14)
> x86_64-apple-darwin9.8.0
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
> other attached packages:
> [1] ShortRead_1.4.0 lattice_0.17-26 BSgenome_1.14.2
> Biostrings_2.14.12 IRanges_1.4.11
> loaded via a namespace (and not attached):
> [1] Biobase_2.6.1 grid_2.10.1 hwriter_1.1 tools_2.10.1
>
> Many thanks!
> Wei
>
>
> On 4/4/10 10:31 PM, Martin Morgan wrote:
>> On 04/04/2010 11:55 AM, Yanwei Tan wrote:
>>
>>> Hi Ramzi Temanni,
>>>
>>> I met the same problem with you when running shortread. As Martin
>>> mentioned, there is one new line missing after the last file record. How
>>> did you fix this problem? I do not know how to add a new line after the
>>> last line. My data is fastq file, I just filtered the reads which
>>> contain N by using the nFilter function in shortread package.
>>>
>> In off-list email you said
>>
>>
>>> I used ShortRead package to filter the data and then saved as fastq
>>> file. But when I run the qa function again there is error in
>>> .local(dirPath, pattern, ...):> > ShortRead internal: too many
>>> 'snap' entries.
>>>
>> It is hard to follow what you are trying to accomplish. Please paste
>> short code to illustrate. Use data files from ShortRead, so that your
>> code is reproducible by others. Include the output of sessionInfo() so
>> that it is clear which version of software you are using. Perhaps after
>>
>> example(readFastq)
>>
>> you do
>>
>>
>>> rfq
>>>
>> class: ShortReadQ
>> length: 256 reads; width: 36 cycles
>>
>>> file = tempfile() # a file to save output
>>> noNrfq = rfq[nFilter()(rfq)]
>>> writeFastq(noNrfq, file)
>>> qaresult = qa(dirname(file), basename(file), type="fastq")
>>>
>> ? But what is the problem? Note also that it is not necessary to write
>> the fastq file to disk,
>>
>>
>>> qa(list(noNrfq=noNrfq))
>>>
>> class: ShortReadQQA(9)
>> QA elements (access with qa[["elt"]]):
>> readCounts: data.frame(1 3)
>> baseCalls: data.frame(1 5)
>> readQualityScore: data.frame(512 4)
>> baseQuality: data.frame(94 3)
>> alignQuality: data.frame(1 3)
>> frequentSequences: data.frame(50 4)
>> sequenceDistribution: data.frame(3 4)
>> perCycle: list(2)
>> baseCall: data.frame(141 4)
>> quality: data.frame(341 5)
>> perTile: list(2)
>> readCounts: data.frame(0 4)
>> medianReadQualityScore: data.frame(0 4)
>>
>> This is my sessionInfo()
>>
>>
>>> sessionInfo()
>>>
>> R version 2.10.1 Patched (2010-03-27 r51570)
>> x86_64-unknown-linux-gnu
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>>
>> other attached packages:
>> [1] ShortRead_1.4.0 lattice_0.18-3 BSgenome_1.14.2
>> Biostrings_2.14.12
>> [5] IRanges_1.4.16
>>
>> loaded via a namespace (and not attached):
>> [1] Biobase_2.6.1 grid_2.10.1 hwriter_1.2 tools_2.10.1
>>
>>
>>> Many thanks in advance!
>>> Wei
>>>
>>> _______________________________________________
>>> Bioc-sig-sequencing mailing list
>>> Bioc-sig-sequencing at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>>
>>
>
>
--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioc-sig-sequencing
mailing list