[Bioc-sig-seq] Comparing two chipseq position sets

Steve Goldstein steveg at lmcg.wisc.edu
Thu May 7 16:53:16 CEST 2009


I just skimmed through the documentation for genomeIntervals and 
IRanges.  They both seem to implement the first step of a comparison --- 
basic set operations like intersection, union, "overlap," etc. --- but 
it doesn't look like they have tests of significance.  Are there 
packages that implement permutation tests for sets of genome intervals? 

A simple permutation test could be done by selecting random sets of 
intervals "matching" the query intervals and counting the number of 
overlaps with the reference intervals.  Each random set of intervals 
could be picked so that the number and size of the intervals was the 
same as the query.   A general implementation of the method would need 
to know the length of each chromosome. 

The implementation should also give the user the ability to specify 
excluded regions of the genome.  For example, if the coordinates are 
derived from aligning reads to the genome, intervals intersecting 
sequence gaps would not be admissible.

Of course, if the null hypothesis for this permutation test (the sets 
intervals are not related) is rejected, then you have to think about the 
next questions:  To what degree are the set related?  Where do they 
differ and where are they the same?


Ivan Gregoretti wrote:
> Hello Steve, Nicolas and Michael,
>
> I agree with all of you: it is not a trivial question.
>
> I asked the bioc-sig-seq listers because I thought, --Hey, this must
> be the everyday's question of the genome analyst.
>
> Say you ran your chipseq under condition A and then you ran it under
> condition B. Then you have to decide whether A and B made any
> difference. It doesn't get any simpler than that!
>
> I can't compare the two means or the two dispersions. I have to
> compare pairs. The problem is that it is not trivial to unambiguously
> determine which spot in B must be paired with each spot in A. To start
> with, A and B may have different numbers of loci (ie 15000 versus
> 18000).
>
> I'll take a look at genomeIntervals and IRanges.
>   
>



More information about the Bioc-sig-sequencing mailing list