[Bioc-sig-seq] `+` for GenomeData and coverage from several lanes
Simon Anders
anders at ebi.ac.uk
Tue Jun 30 17:32:05 CEST 2009
Hi Michael et al.
Assume I have a list of AlignedRead objects, with data from several
Solexa lanes. I would like to get a coverage vector over all the lanes.
As 'coverage' takes only one AlignedRead object, I have two possibilities:
(a) Concatenate the AlignedRead objects to a single big one. As far as I
can see, 'rbind2' is not implemented for AlignedRead, and 'append'
seemed very slow to me. It is probably not posible to do this without
making a copy of all the data in memory.
(b) Calculate the coverage for each AlignedRead object separately and
add up the GenomeData objects. The `+` operator is not supported for
GenomeData objects but it is for Rle objects. So I need to loop through
the chromosomes.
I've now written this short function for the purpose:
sumUpCoverage <- function( lanes, seqLens )
{
res <- NULL
for( i in 1:length(lanes) ) {
cvg <- coverage( lanes, width=seqLens )
if( is.null( res ) )
res <- cvg
else {
stopifnot( all( names(res) == names(cvg) ) )
for( seq in names(res) )
res[[seq]] <- res[[seq]] + cvg[[seq]]
}
}
res
}
This does not seem very elegant (and it takes 9 minutes, which, however,
might be ok for 29 mio reads). Any idea how to do it better? (The use of
'for' instead of 'sapply' is on purpose: I hope to save memory that way.)
And would it make sense to overload the `+` operator for GenomeData
objects? If so, could I suggest adding this to BSgenome?
Cheers
Simon
More information about the Bioc-sig-sequencing
mailing list