[Bioc-sig-seq] aggregating a RangedData over an IRanges object
Simon Anders
anders at ebi.ac.uk
Mon Jul 13 18:56:49 CEST 2009
Hi,
I've got RNA-Seq data in a wiggle file and exon coordinates in a GFF
file. Reading this in with rtracklayer's import function, I get the
wiggle file as UCSCData object and the GFF file as RangedData object.
I would now like to get the integral of the coverage for each exon.
So I extract the RangedData inside the UCSCData by coercion and get
> cvg.bl1 <- as( import( "igb/10-BL1-coverage.wig" ), "RangedData" )
> cvg.bl1
RangedData: 253256 ranges by 1 column on 1 sequence
colnames(1): score
names(1): chr10
and have an IRanges object with the exon boundaries here:
> exons <- import( "igb/10.gff" )
> exons <- exons[ exons$type == "exon", ]
> ranges(exons)[["10"]]
IRanges instance:
start end width
[1] 83769 83877 109
[2] 190766 190883 118
... ... ... ...
[27083] 135347172 135347681 510
Now, I would like to go through each exon interval and sum up the
coverage, i.e., do something like
aggregate( cvg.bl1, ranges(exons), sum )
This, however, does not work, as aggregate seems to be unable to deal
with a RangedData object as first parameter. This here, in contrast,
does give a result
aggregate( Rle( 1, 1000000000 ), ranges(exons), sum )
namely a vector of length 27083.
Now, I have three questions:
1. How can I get the vector referring to chr10 in cvg.bl1? Is there a
way to coerce to Rle or numeric? Supposedly, this only make sense if I
subset the RangedData object to only contain one chromosome.
2. How can I aggregate over the RangedData object? As it contains only
one chromosome, it should be possible to coerce it to an Rle vector, but
there is no coercion method, and, it should be possible anyway to avoid
this.
3. In the present case, 'ranges(exons)' contains a CompressedIRangesList
with 1 element, and not a simple IRanges object. If it contains more
than one element, it seems that aggregate calculates the aggregate
vector for each element and just concatenated them. Does this really
make sense?
Cheers
Simon
> sessionInfo()
R version 2.10.0 Under development (unstable) (2009-06-26 r48838)
x86_64-unknown-linux-gnu
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=C LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] IRanges_1.3.33 rtracklayer_1.5.7 RCurl_0.98-1 bitops_1.0-4.1
loaded via a namespace (and not attached):
[1] Biobase_2.5.4 Biostrings_2.13.24 BSgenome_1.13.10 tools_2.10.0
[5] XML_2.5-3
More information about the Bioc-sig-sequencing
mailing list