[Bioc-sig-seq] Short read overlap function
Joern Toedling
Joern.Toedling at curie.fr
Fri Jul 24 17:28:36 CEST 2009
Hello,
I guess the mentioned functions certainly qualify as 'proper' overlap
functions. What you probably want is a relatively straightforward
post-processing of the result. Below is a function that I have written to
restrict overlapping pairs to those pairs that overlap by at least a specified
fraction of the smaller interval's length. Setting this fraction to 1.0 will
only give you pairs in which one of the intervals is contained in the other
one. This function uses genomeIntervals, but I am sure that post-processing
the IRanges is equally straightforward.
Hope this helps,
Joern
fracOverlap <- function(I1, I2, min.frac=1.0){
require("genomeIntervals")
stopifnot(inherits(I1,"Genome_intervals"),
inherits(I1,"Genome_intervals"))
ov <- interval_overlap(I1,I2)
# get base pair overlap
lens <- sapply(ov, length)
overlap1 <- rep(1:length(ov), lens)
overlap2 <- unlist(ov, use.names=FALSE)
left <- pmax(I1[overlap1,1], I2[overlap2,1])
right <- pmin(I1[overlap1,2], I2[overlap2,2])
stopifnot(all(right >= left))
bases <- right-left+1
min.len <- pmin(I1[overlap1,2]- I1[overlap1,1]+1,
I2[overlap2,2]- I2[overlap2,1]+1)
frac <- round(bases/min.len, digits=2)
res <- data.frame("Index1"=overlap1, "Index2"=overlap2,
"n"=bases, "fraction"=frac)
res <- subset(res, fraction >= min.frac)
return(res)
}# fracOverlap
On Fri, 24 Jul 2009 16:47:04 +0200, Johannes Waage wrote
> Hi all,
>
> In assigning RNA-seq data to exon-models, I'm looking for a proper overlap
> function. Both IRanges and genomeIntervals have overlap functions,
> but as far as I can see, these don't have options for contained
> overlaps, example:
>
> |-------Range 1-------]
> [----Range 2----]
>
> IRanges, genomeIntervals: TRUE
> Wanted: TRUE
>
> |-------Range 1-------]
> [----Range 2----]
>
> IRanges, genomeIntervals: TRUE
> Wanted: FALSE
>
> Any suggestions are appreciated!
>
> Regards,
> Johannes Waage,
> Uni. of Copenhagen
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
---
Joern Toedling
Institut Curie -- U900
26 rue d'Ulm, 75005 Paris, FRANCE
Tel. +33 (0)156246926
More information about the Bioc-sig-sequencing
mailing list