[Bioc-sig-seq] GenomicFeatures vignette, question on findOverlaps results?
Patrick Aboyoun
paboyoun at fhcrc.org
Mon Apr 5 20:35:51 CEST 2010
P. Terry,
In your example, the GR_txdb object is the query, not the subject, in
the overlap calculation. The following code should get you what you are
looking for:
OL<- findOverlaps(GR_txdb, r_gr_ChSeqPks)
tdata<- GR_txdb[queryHits(OL),]
Patrick
On 4/5/10 11:02 AM, pterry at huskers.unl.edu wrote:
> Dear bioc-sig-sequencing,
>
> Working thru the GenomicFeatures vignette dated 03/27/10 with an Arabidopsis example, I applied 'findOverlaps' to learn which chipseq peaks will overlap with Arabidopsis transcrips. In attempting to manually verify the results from 'findOverlaps', none of the ranges in the chipSeq peaks list overlaps the ranges from for example, the first record of the 'findOverlaps' output.
>
> That is, the lowest address range in the chipSeq peaks (all chr1) list is 617092-617094. The 1st recond in the output from 'findOverlaps' is the transcript, 28500-28706. Can someone offer an explanation of why 'findOverlaps' reports a transcript with this location as an overlap for any of the peaks in the chipSeq peak list?
>
>
>
>> GR_txdb<- transcripts(mart4_at_eg_gene)
>> GR_txdb
>>
> GRanges with 39640 ranges and 2 elementMetadata values
> seqnames ranges strand | tx_id tx_name
> <Rle> <IRanges> <Rle> |<integer> <character>
> [1] 1 [ 3631, 5899] + | 5480 AT1G01010.1-TAIR
> [2] 1 [23146, 31227] + | 3216 AT1G01040.1-TAIR
> [3] 1 [28500, 28706] + | 8461 AT1G01046.1-TAIR
> [4] 1 [44677, 44787] + | 3566 AT1G01073.1-TAIR
> ...
>
>> r_gr_ChSeqPks
>>
> GRanges with 57 ranges and 0 elementMetadata values
> seqnames ranges strand |
> <Rle> <IRanges> <Rle> |
> [1] 1 [ 617092, 617094] * |
> [2] 1 [1808262, 1808262] * |
> [3] 1 [3889445, 3889452] * |
> [4] 1 [4404410, 4404410] * |
> ...
>
>> OL<- findOverlaps(GR_txdb, r_gr_ChSeqPks)
>> tdata<- GR_txdb[subjectHits(OL),]
>> tdata
>>
> GRanges with 43 ranges and 2 elementMetadata values
> seqnames ranges strand | tx_id tx_name
> <Rle> <IRanges> <Rle> |<integer> <character>
> [1] 1 [ 28500, 28706] + | 8461 AT1G01046.1-TAIR
> [2] 1 [ 88898, 89745] + | 6886 AT1G01210.1-TAIR
> [3] 1 [ 91750, 95651] + | 4400 AT1G01220.1-TAIR
> [4] 1 [ 95987, 97407] + | 7355 AT1G01225.1-TAIR
> ...
>
>> length(tdata)
>>
> [1] 43
>
>
>> sessionInfo()
>>
> R version 2.12.0 Under development (unstable) (2010-03-30 r51506)
> x86_64-unknown-linux-gnu
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] biomaRt_2.3.5 GenomicFeatures_0.5.0 GenomicRanges_0.1.0
> [4] IRanges_1.5.73
>
> loaded via a namespace (and not attached):
> [1] Biobase_2.7.5 Biostrings_2.15.26 BSgenome_1.15.20 DBI_0.2-5
> [5] RCurl_1.3-1 RSQLite_0.8-4 rtracklayer_1.7.11 tools_2.12.0
> [9] XML_2.8-1
>
>>
>
> Thanks,
> P. Terry
> pterry at huskers.unl.edu
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>
More information about the Bioc-sig-sequencing
mailing list