[Bioc-sig-seq] Overlap of multiple RangedData instances.

Ivan Gregoretti ivangreg at gmail.com
Tue Mar 16 22:48:23 CET 2010


I see. It never occurred to me to try a consolidation with labels.
I'll try it and see if it looks simpler (and perhaps faster) than my
'for loops'.

Thank you,

Ivan

Ivan Gregoretti, PhD
National Institute of Diabetes and Digestive and Kidney Diseases
National Institutes of Health
5 Memorial Dr, Building 5, Room 205.
Bethesda, MD 20892. USA.
Phone: 1-301-496-1592
Fax: 1-301-496-9878



On Tue, Mar 16, 2010 at 5:40 PM, Michael Lawrence
<lawrence.michael at gene.com> wrote:
> One approach:
>
> Combine the ranges into a single RangedData, with a factor tracking their
> source. Then do a findOverlaps, asking for all overlaps. Get the source from
> the subject indices and call table() with the indices in A and the source
> variable. Then you have something like:
>
> A index    B    C   D
> 1              2    3    0
> 2              1    2    1
> 3              0    5    2
>
> Etc.
>
> Then just do rowSums(tab > 0) == 3 to see which hit all three sources.
>
> Michael
>
> On Tue, Mar 16, 2010 at 1:55 PM, Ivan Gregoretti <ivangreg at gmail.com> wrote:
>>
>> Hi Michael,
>>
>> I think that I expressed myself too ambiguously.
>>
>> I am looking for the multiple intersection. That is, I am looking for
>> the ranges in A that are even minimally overlapped by ranges in ALL
>> the rest of the RangesData instances. A super-dee-duper intersection.
>>
>> The union() operation would be finding overlaps with ANY of the rest
>> of the RangesData.
>>
>> At the moment, like Karl, I am evading the problem by dint of shameful
>> for loops.
>>
>> Do you think that there is a more elegant way?
>>
>> Thanks
>>
>> Ivan
>>
>>
>> Ivan Gregoretti, PhD
>> National Institute of Diabetes and Digestive and Kidney Diseases
>> National Institutes of Health
>> 5 Memorial Dr, Building 5, Room 205.
>> Bethesda, MD 20892. USA.
>> Phone: 1-301-496-1592
>> Fax: 1-301-496-9878
>>
>>
>>
>> On Tue, Mar 16, 2010 at 4:26 PM, Michael Lawrence
>> <lawrence.michael at gene.com> wrote:
>> >
>> >
>> > On Tue, Mar 16, 2010 at 12:34 PM, Ivan Gregoretti <ivangreg at gmail.com>
>> > wrote:
>> >>
>> >> Hello everybody,
>> >>
>> >> Say A, B, C, D, .... are all RangedData instances.
>> >>
>> >> How do you come up with the list of ranges in A that have at least
>> >> some overlap with B, C, D, ...?
>> >>
>> >> I want to calculate the multiple intersection ignoring the extent of
>> >> the overlap.
>> >>
>> >> ?findOverlaps does not hint how to recover the ranges from A.
>> >>
>> >
>> > Something like:
>> > ranges(A) %in% union(union(ranges(B), ranges(C)), ranges(D))
>> >
>> > That syntax is a little verbose. I propose adding some operators to
>> > Ranges,
>> > RangesList and RangedData objects.
>> >
>> > "+" for union()
>> > "-" for setdiff()
>> > "!" for gaps()
>> >
>> > Then we could have:
>> > A %in% (B + C + D)
>> >
>> > What do people think?
>> >
>> >
>> >>
>> >> Thank you,
>> >>
>> >> Ivan
>> >>
>> >>
>> >> Ivan Gregoretti, PhD
>> >> National Institute of Diabetes and Digestive and Kidney Diseases
>> >> National Institutes of Health
>> >> 5 Memorial Dr, Building 5, Room 205.
>> >> Bethesda, MD 20892. USA.
>> >> Phone: 1-301-496-1592
>> >> Fax: 1-301-496-9878
>> >>
>> >> _______________________________________________
>> >> Bioc-sig-sequencing mailing list
>> >> Bioc-sig-sequencing at r-project.org
>> >> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>> >
>> >
>
>



More information about the Bioc-sig-sequencing mailing list