[BioC] reducing RangedData
Patrick Aboyoun
paboyoun at fhcrc.org
Thu Oct 22 09:58:36 CEST 2009
Robert,
I wrote you some code that you can use below. I've seen other reduction
operations where the user wanted to reduce a RangedData type object by
more than one column (e.g. strand and score), so perhaps an appropriate
signature for a reduce method for RangedData would be
reduce(x, by, ...)
If this method where written, the remaining question would be should it
support reducing transformations of the remaining values columns or
should non "by" columns be dropped from the output. I'm inclined towards
the latter, since a reduce(x, by, valuesFUN) method wouldn't be much
simpler than the rdapply method given below. Would a straight-forward
reduce(x = rd, by = "strand") approach for RangedData meet your needs?
> suppressMessages(library(IRanges))
> sta <- c(1, 2, 5, 6, 2, 3, 4, 5)
> end <- c(3, 4, 5, 7, 2, 4, 6, 7)
> str <- rep(c("+", "+", "-", "-"), 2)
> chr <- rep(c("chr1", "chr2"), each = 4)
> rd <- RangedData(IRanges(start=sta, end=end), strand=str, space=chr)
> rd
RangedData: 8 ranges by 1 columns
columns(1): strand
sequences(2): chr1 chr2
>
> # Interesting bit starts here
> reduceStranded <- function(x) {
+ strand <- x$strand
+ rngs <- ranges(x)[[1]]
+ ans <- IRangesList("+" = reduce(rngs[strand == "+"]),
+ "-" = reduce(rngs[strand == "-"]))
+ RangedData(unlist(ans, use.names = FALSE),
+ strand = rep(c("+", "-"), elementLengths(ans)),
space = names(x))
+ }
> params <- RDApplyParams(rd, reduceStranded, reducerFun = function(x)
do.call(c, unname(x)))
> rdapply(params)
RangedData: 4 ranges by 1 columns
columns(1): strand
sequences(2): chr1 chr2
> as.data.frame(rdapply(params))
space start end width strand
1 chr1 1 4 4 +
2 chr1 5 7 3 -
3 chr2 2 4 3 +
4 chr2 4 7 4 -
> sessionInfo()
R version 2.9.2 Patched (2009-08-24 r49420)
i386-apple-darwin9.8.0
locale:
C/en_US.UTF-8/C/C/C/C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] IRanges_1.2.3
Robert Castelo wrote:
> dear list,
>
> i'd like to project a set of exons on genomic space but keeping
> the strand information. let's assume for simplicity that no exons
> overlap in opposite strands so no conflicts need to be resolved
> regarding that case. in principle this is easy without keeping the
> strand using a RangeList and the reduce method. however i've been
> struggling for a while without success to figure out how can i
> "reduce" my coordinates without loosing the strand information.
>
> my guess is that to carry out the strand information i need to
> use a RangedData object:
>
> sta <- c(1, 1, 1)
> end <- c(3, 4, 5)
> str <- c("+", "+", "-")
> rd <- RangedData(IRanges(start=sta, end=end), strand=str, space=chr)
> rd
> RangedData: 3 ranges by 1 columns
> columns(1): strand
> sequences(2): chr1 chr2
>
> and then use rdapply to reduce on each of the chromosomes but i
> either don't know how to directly apply reduce:
>
> params <- RDApplyParams(rd, reduce)
> rdapply(params)
> Error in function (classes, fdef, mtable) :
> unable to find an inherited method for function "reduce", for
> signature "RangedData"
>
>
> or i loose the strand information:
>
> params <- RDApplyParams(rd, function(rd) reduce(ranges(rd)))
> rdapply(params)
> $chr1
> RangesList: 1 range
> 1: 1:4
>
> $chr2
> RangesList: 1 range
> 1: 1:5
>
> thanks for your help!
> robert.
>
>
>
> sessionInfo()
> R version 2.9.1 (2009-06-26)
> i386-apple-darwin8.11.1
>
> locale:
> en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] rtracklayer_1.4.1 RCurl_0.98-1
> [3] bitops_1.0-4.1 GOstats_2.10.0
> [5] graph_1.22.2 Category_2.10.1
> [7] geneplotter_1.22.0 lattice_0.17-25
> [9] limma_2.18.2 bgSpJncArrayDm1.db_1.0.1
> [11] org.Dm.eg.db_2.2.11 RSQLite_0.7-1
> [13] DBI_0.2-4
> BSgenome.Dmelanogaster.UCSC.dm1_1.0.0
> [15] BSgenome_1.12.3 RColorBrewer_1.0-2
> [17] Biostrings_2.12.8 IRanges_1.2.3
> [19] annotate_1.22.0 AnnotationDbi_1.6.1
> [21] Biobase_2.4.1 xtable_1.5-5
>
> loaded via a namespace (and not attached):
> [1] genefilter_1.24.2 GO.db_2.2.11 grid_2.9.1 GSEABase_1.6.1
> [5] RBGL_1.20.0 splines_2.9.1 survival_2.35-4 tools_2.9.1
> [9] XML_2.5-3
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list