[Bioc-sig-seq] Comparing two chipseq position sets
Steve Goldstein
steveg at lmcg.wisc.edu
Thu May 7 16:53:16 CEST 2009
I just skimmed through the documentation for genomeIntervals and
IRanges. They both seem to implement the first step of a comparison ---
basic set operations like intersection, union, "overlap," etc. --- but
it doesn't look like they have tests of significance. Are there
packages that implement permutation tests for sets of genome intervals?
A simple permutation test could be done by selecting random sets of
intervals "matching" the query intervals and counting the number of
overlaps with the reference intervals. Each random set of intervals
could be picked so that the number and size of the intervals was the
same as the query. A general implementation of the method would need
to know the length of each chromosome.
The implementation should also give the user the ability to specify
excluded regions of the genome. For example, if the coordinates are
derived from aligning reads to the genome, intervals intersecting
sequence gaps would not be admissible.
Of course, if the null hypothesis for this permutation test (the sets
intervals are not related) is rejected, then you have to think about the
next questions: To what degree are the set related? Where do they
differ and where are they the same?
Ivan Gregoretti wrote:
> Hello Steve, Nicolas and Michael,
>
> I agree with all of you: it is not a trivial question.
>
> I asked the bioc-sig-seq listers because I thought, --Hey, this must
> be the everyday's question of the genome analyst.
>
> Say you ran your chipseq under condition A and then you ran it under
> condition B. Then you have to decide whether A and B made any
> difference. It doesn't get any simpler than that!
>
> I can't compare the two means or the two dispersions. I have to
> compare pairs. The problem is that it is not trivial to unambiguously
> determine which spot in B must be paired with each spot in A. To start
> with, A and B may have different numbers of loci (ie 15000 versus
> 18000).
>
> I'll take a look at genomeIntervals and IRanges.
>
>
More information about the Bioc-sig-sequencing
mailing list