[Bioc-sig-seq] scanBam Error
Martin Morgan
mtmorgan at fhcrc.org
Thu Dec 16 18:45:05 CET 2010
On 12/13/2010 02:00 PM, Dario Strbenac wrote:
> Hi,
>
> Yes, that works fine, thanks. It must've been a size issue I was having.
Rsamtools 1.2.2 in release has been updated to say
too many records, use 'param=ScanBamParam(which=<...>)'
when the number of reads / nucleotides results in more than 2^31-1
nucleotides; The devel version of Rsamtools also currently does this,
but the intention is to arrive at a more robust solution.
I think this addresses the problem, but would be happy to know if the
original example still fails.
Martin
>
> ---- Original message ----
>> Date: Mon, 13 Dec 2010 17:31:24 +1000
>> From: Paul Leo <p.leo at uq.edu.au>
>> Subject: Re: [Bioc-sig-seq] scanBam Error
>> To: D.Strbenac at garvan.org.au
>> Cc: bioc-sig-sequencing at r-project.org
>>
>> Do you need all the sequence data at once?
>>
>> Instead of using a smaller bam file can you read in
>> a smaller portion of your large bamfile ?
>>
>> data.gr<-GRanges(seqnames
>> =paste("chr",13,sep=""),ranges =
>> IRanges(start=as.numeric(28608234),end=as.numeric(28608363)),strand="+")
>>
>> which<- data.gr
>> params<-ScanBamParam(which=which,flag=scanBamFlag(isUnmappedQuery=FALSE,isDuplicate=NA,isValidVendorRead=TRUE),simpleCigar
>> = FALSE,reverseComplement =
>> FALSE,what=c("qname","flag","rname","seq","strand","pos","mpos","qwidth","cigar","qual","mapq","isize",
>> "mrnm" ),tag="RG" ) # change to what you want
>> aln1 <- scanBam("HS1808.bam",param=params)
>>
>> aln1[[1]]
>>
>> That should work fine?
>>
>> --
>> Dr Paul Leo
>> Bioinformatician
>> UQ Diamantina Institute for Cancer, Immunology and Metabolic Medicine
>> ---------------------------------------------------------------------
>> Level 4, R Wing
>> Princess Alexandra Hospital
>> Ipswich Rd
>> Woolloongabba QLD 4102
>> Tel: +61 7 3240 7740 Mob: 041 303 8691 Fax: +61 7 3240 5946
>> Email: p.leo at uq.edu.au Web: http://www.di.uq.edu.au
>>
>> -----Original Message-----
>> From: Dario Strbenac <D.Strbenac at garvan.org.au>
>> Reply-to: D.Strbenac at garvan.org.au
>> To: bioc-sig-sequencing at r-project.org
>> Subject: Re: [Bioc-sig-seq] scanBam Error
>> Date: Mon, 13 Dec 2010 17:15:38 +1100
>>
>> I tried it out by making a smaller bam file with only reads from one chromosome, and it worked fine. The full bam file is 4 GB and has 75 million reads in it. Could the size be a problem ? Could you test out a bam file of this size on your end, without me sending you one that big ? Also, the error is different after I put the scamBamParam in the right spot :
>>
>> Error in .Call(func, file, index, "rb", NULL, flag, simpleCigar, ...) :
>> negative length vectors are not allowed
>>
>> Integer overflow somewhere, maybe ?
>>
>> - Dario.
>>
>> ---- Original message ----
>>> Date: Sun, 12 Dec 2010 20:59:23 -0800
>>> From: Martin Morgan <mtmorgan at fhcrc.org>
>>> Subject: Re: [Bioc-sig-seq] scanBam Error
>>> To: D.Strbenac at garvan.org.au
>>> Cc: bioc-sig-sequencing at r-project.org
>>>
>>> On 12/12/2010 08:00 PM, Dario Strbenac wrote:
>>>> Hello,
>>>>
>>>
>>>> I'm having trouble reading in a BAM file when "seq" is one of the
>>> strings passed to the what argument of ScanBamParam. If it's not, then
>>> the the reading completes successfully. I don't understand what the
>>> error means. It is :
>>>>
>>>> Error in .io_bam(.scan_bam, file, index, reverseComplement, tmpl, param = param) :
>>>> INTEGER() can only be applied to a 'integer', not a 'closure'
>>>>
>>>> The traceback is :
>>>>
>>>>> traceback()
>>>> 4: .Call(func, file, index, "rb", NULL, flag, simpleCigar, ...)
>>>> 3: .io_bam(.scan_bam, file, index, reverseComplement, tmpl, param = param)
>>>> 2: scanBam("HS1808.bam", flag = ScanBamFlag(isDuplicate = FALSE),
>>>> param = ScanBamParam(reverseComplement = TRUE, what = c("rname",
>>>> "strand", "pos", "seq")))
>>>> 1: scanBam("HS1808.bam", flag = ScanBamFlag(isDuplicate = FALSE),
>>>> param = ScanBamParam(reverseComplement = TRUE, what = c("rname",
>>>> "strand", "pos", "seq")))
>>>>
>>>> and the environment is :
>>>>
>>>> R version 2.12.0 (2010-10-15)
>>>> Platform: x86_64-pc-mingw32/x64 (64-bit)
>>>>
>>>> locale:
>>>> [1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252 LC_MONETARY=English_Australia.1252 LC_NUMERIC=C LC_TIME=English_Australia.1252
>>>>
>>>> attached base packages:
>>>> [1] stats graphics grDevices utils datasets methods base
>>>>
>>>> other attached packages:
>>>> [1] Rsamtools_1.2.1 Biostrings_2.18.0 GenomicRanges_1.2.0 IRanges_1.8.2
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] Biobase_2.8.0
>>>
>>> Hi Dario -- this is some kind of error in Rsamtools' C code, but I'm not
>>> able to reproduce it on my end so can't track it down. Is there any way
>>> of producing and sharing with me an example file that has this problem?
>>>
>>> One thing (not causing the bug) in your traceback is that 'flag' should
>>> be an argument to ScanBamParam; as it is I think it is being silently
>>> ignored.
>>>
>>> Martin
>>>
>>>>
>>>> --------------------------------------
>>>> Dario Strbenac
>>>> Research Assistant
>>>> Cancer Epigenetics
>>>> Garvan Institute of Medical Research
>>>> Darlinghurst NSW 2010
>>>> Australia
>>>>
>>>> _______________________________________________
>>>> Bioc-sig-sequencing mailing list
>>>> Bioc-sig-sequencing at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>>
>>>
>>> --
>>> Computational Biology
>>> Fred Hutchinson Cancer Research Center
>>> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>>>
>>> Location: M1-B861
>>> Telephone: 206 667-2793
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>
>
> --------------------------------------
> Dario Strbenac
> Research Assistant
> Cancer Epigenetics
> Garvan Institute of Medical Research
> Darlinghurst NSW 2010
> Australia
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
Location: M1-B861
Telephone: 206 667-2793
More information about the Bioc-sig-sequencing
mailing list