[Bioc-sig-seq] A bug in the AlignedRead coverage method?
Martin Morgan
mtmorgan at fhcrc.org
Wed May 27 17:56:49 CEST 2009
Hi Nicolas -- can you be explicit about how you are invoking coverage?
The parameterization of the underlying 'coverage' function is changing,
and it is not quite clear whether you are having problems with the
'classic' or the new formulation.
Martin
Nicolas Delhomme wrote:
> Hi all,
>
> (Martin, I guess this one is for you :-))
>
> I have a subset of an AlignedRead
>
> class: AlignedRead
> length: 1323 reads; width: 36 cycles
> chromosome: 2R 2R ... 2R 2R
> position: 5866890 5867511 ... 5867007 5867434
> strand: - + ... + -
> alignQuality: NumericQuality
> alignData varLabels: mismatch
>
> When I call the coverage function, I get back a GenomeData obj. Fine.
>
> Formal class 'GenomeData' [package "BSgenome"] with 4 slots
> ..@ listData :List of 1
> .. ..$ 2R:Formal class 'Rle' [package "IRanges"] with 5 slots
> .. .. .. ..@ values : int [1:888] 1 2 5 9 12 14 17 20 21 22 ...
> .. .. .. ..@ lengths : int [1:888] 7 1 2 1 1 2 1 3 2 2 ...
> .. .. .. ..@ elementMetadata: NULL
> .. .. .. ..@ elementType : chr "ANYTHING"
> .. .. .. ..@ metadata : list()
> ..@ elementMetadata: NULL
> ..@ elementType : chr "ANYTHING"
> ..@ metadata :List of 6
> .. ..$ organism : NULL
> .. ..$ provider : NULL
> .. ..$ providerVersion: NULL
> .. ..$ method : chr "coverage,AlignedRead-method"
> .. ..$ coords : chr "leftmost"
> .. ..$ extend : int 0
>
> What is surprising is when I look at the Rle object I got:
>
> 'integer' Rle instance of length 5868503 with 888 runs
> Lengths: 7 1 2 1 1 2 1 3 2 2 ...
> Values : 1 2 5 9 12 14 17 20 21 22 ...
>
> which tells me that I have a "covered" region of 5868503 bp instead of
> the 1675 bp I'm expecting.
>
> This comes from the last set of length/value from the Rle object:
>
> runLength(test$`2R`)[886:888]
> [1] 179 36 5866828
>
> runValue(test$`2R`)[886:888]
> [1] 0 1 0
>
> I could track it down to the AlignedRead coverage method (l.179 of
> methods-AlignedRead.R in svn rev. 39735), where the code is:
>
> coverage(IRanges(rstart, rend), shift=1L-start, width=end-shift, ...)
>
> As shift is negative, the line should be changed to:
>
> coverage(IRanges(rstart, rend), shift=1L-start, width=end+shift, ...)
>
> But this should not work, as "shift" is not defined beforehand. I would
> have expected it to crash, but apparently it is simply is ignored....
> (there's some R voodoo going on there...) and simply results in this call:
>
> coverage(IRanges(rstart, rend), shift=1L-start, width=end, ...)
>
> Unexpected, but anyway, this corrects it:
>
> coverage(IRanges(rstart, rend), shift=1L-start, width=end+1L-start, ...)
>
> and gives the same results as
>
> coverage(IRanges(rstart, rend), start=start, end=end, ...)
>
> 'integer' Rle instance of length 1675 with 887 runs
> Lengths: 7 1 2 1 1 2 1 3 2 2 ...
> Values : 1 2 5 9 12 14 17 20 21 22 ...
>
> Best,
>
> ---------------------------------------------------------------
> Nicolas Delhomme
>
> High Throughput Functional Genomics Center
>
> European Molecular Biology Laboratory
>
> Tel: +49 6221 387 8426
> Email: nicolas.delhomme at embl.de
> Meyerhofstrasse 1 - Postfach 10.2209
> 69102 Heidelberg, Germany
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioc-sig-sequencing
mailing list