[Bioc-sig-seq] A bug in the AlignedRead coverage method?
Nicolas Delhomme
delhomme at embl.de
Wed May 27 16:40:14 CEST 2009
Hi all,
(Martin, I guess this one is for you :-))
I have a subset of an AlignedRead
class: AlignedRead
length: 1323 reads; width: 36 cycles
chromosome: 2R 2R ... 2R 2R
position: 5866890 5867511 ... 5867007 5867434
strand: - + ... + -
alignQuality: NumericQuality
alignData varLabels: mismatch
When I call the coverage function, I get back a GenomeData obj. Fine.
Formal class 'GenomeData' [package "BSgenome"] with 4 slots
..@ listData :List of 1
.. ..$ 2R:Formal class 'Rle' [package "IRanges"] with 5 slots
.. .. .. ..@ values : int [1:888] 1 2 5 9 12 14 17 20 21
22 ...
.. .. .. ..@ lengths : int [1:888] 7 1 2 1 1 2 1 3 2 2 ...
.. .. .. ..@ elementMetadata: NULL
.. .. .. ..@ elementType : chr "ANYTHING"
.. .. .. ..@ metadata : list()
..@ elementMetadata: NULL
..@ elementType : chr "ANYTHING"
..@ metadata :List of 6
.. ..$ organism : NULL
.. ..$ provider : NULL
.. ..$ providerVersion: NULL
.. ..$ method : chr "coverage,AlignedRead-method"
.. ..$ coords : chr "leftmost"
.. ..$ extend : int 0
What is surprising is when I look at the Rle object I got:
'integer' Rle instance of length 5868503 with 888 runs
Lengths: 7 1 2 1 1 2 1 3 2 2 ...
Values : 1 2 5 9 12 14 17 20 21 22 ...
which tells me that I have a "covered" region of 5868503 bp instead of
the 1675 bp I'm expecting.
This comes from the last set of length/value from the Rle object:
runLength(test$`2R`)[886:888]
[1] 179 36 5866828
runValue(test$`2R`)[886:888]
[1] 0 1 0
I could track it down to the AlignedRead coverage method (l.179 of
methods-AlignedRead.R in svn rev. 39735), where the code is:
coverage(IRanges(rstart, rend), shift=1L-start, width=end-shift, ...)
As shift is negative, the line should be changed to:
coverage(IRanges(rstart, rend), shift=1L-start, width=end+shift, ...)
But this should not work, as "shift" is not defined beforehand. I
would have expected it to crash, but apparently it is simply is
ignored.... (there's some R voodoo going on there...) and simply
results in this call:
coverage(IRanges(rstart, rend), shift=1L-start, width=end, ...)
Unexpected, but anyway, this corrects it:
coverage(IRanges(rstart, rend), shift=1L-start, width=end+1L-start, ...)
and gives the same results as
coverage(IRanges(rstart, rend), start=start, end=end, ...)
'integer' Rle instance of length 1675 with 887 runs
Lengths: 7 1 2 1 1 2 1 3 2 2 ...
Values : 1 2 5 9 12 14 17 20 21 22 ...
Best,
---------------------------------------------------------------
Nicolas Delhomme
High Throughput Functional Genomics Center
European Molecular Biology Laboratory
Tel: +49 6221 387 8426
Email: nicolas.delhomme at embl.de
Meyerhofstrasse 1 - Postfach 10.2209
69102 Heidelberg, Germany
More information about the Bioc-sig-sequencing
mailing list