[Bioc-sig-seq] Rsamtools problem reading seq information

ivan.borozan at utoronto.ca ivan.borozan at utoronto.ca
Thu Nov 11 19:17:44 CET 2010


Hi Martin,

Thanks. The bam file was corrupted.

Ivan

Quoting Martin Morgan <mtmorgan at fhcrc.org>:

> Hi Ivan --
>
> On 11/11/2010 08:25 AM, ivan.borozan at utoronto.ca wrote:
>> Hello list,
>>
>> I have scanned a large bam (15G) file from Bioscope (SOLID) using
>> Rsamtools and the code below:
>>
>>> library(Rsamtools)
>> Loading required package: GenomicRanges
>>>
>>>
>>> p4<- ScanBamParam(what = c("seq"), flag = scanBamFlag(isUnmappedQuery
>>> = TRUE))
>>>
>>> res3 <- scanBam("test.bam",param=p4, maxMemory=5000)[[1]]
>
> the 'maxMemory' argument is being silently ignored by scanBam, it
> doesn't do anything.
>
> The problem could be in the BAM file, in samtools, or in Rsamtools. To
> narrow down you might confirm basic functionality in Rsamtools (e.g.,
> example(scanBam)), reading the header of the file (scanBamHeader), and
> loading a smaller portion of the bam file in Rsamtools (using the
> 'which' argument of ScanBamParam). It might also be informative to know
> whether the issue is with seq/qual, or with other parts of the reads, in
> particular what=c("rname", "pos").
>
> Outside R, you might try to view a small portion of your bam file with
>
>   samtools view <your-file-here> chr1:100-200|wc -l
>
> or similar
>
> Martin
>
>>
>>
>> it is not clear to me why I get all sequences as
>>
>>
>>> res3$seq[1]
>>
>>  A DNAStringSet instance of length 1
>>     width seq
>> [1]     1 N
>>
>>
>> and all Phred-encoded, phred-scaled base quality scores as:
>>
>>
>>> p4<- ScanBamParam(what = c("qual"), flag = scanBamFlag(isUnmappedQuery
>>> = TRUE))
>>>
>>>
>>> res3 <-
>>> scanBam("solid0085_20090610_ICGC_Xeno_wholetranscrptome_4041X_F3_sortedByReadId.bam",param=p4,
>>> maxMemory=5000)[[1]]
>>>
>>
>>
>>
>>> res3$qual[1]
>>   A PhredQuality instance of length 1
>>     width seq
>> [1]     1 !
>>
>>
>> Many thanks for any suggestions,
>>
>> Ivan
>>
>>
>>> sessionInfo()
>> R version 2.12.0 (2010-10-15)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>>
>> locale:
>>  [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C
>>  [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8
>>  [5] LC_MONETARY=C              LC_MESSAGES=en_CA.UTF-8
>>  [7] LC_PAPER=en_CA.UTF-8       LC_NAME=C
>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] Rsamtools_1.2.0     GenomicRanges_1.2.0 Biostrings_2.18.0
>> [4] IRanges_1.8.0
>>
>> loaded via a namespace (and not attached):
>> [1] Biobase_2.10.0 tools_2.12.0
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>
>
> --
> Computational Biology
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>
> Location: M1-B861
> Telephone: 206 667-2793
>



More information about the Bioc-sig-sequencing mailing list