[BioC] New to Bioconductor is there a better way?
Kasper Daniel Hansen
kasperdanielhansen at gmail.com
Thu Mar 15 15:05:42 CET 2012
This is the way to do it.
There is a convenience function called subsetByOverlaps(), you can
probably guess what it does.
Kasper
On Thu, Mar 15, 2012 at 10:01 AM, Davis, Brian <Brian.Davis at uth.tmc.edu> wrote:
> I'm very new to Bioconductor (first time to use it) but not to R. I have a solution to my problem but being new to Bioconductor I'm wondering if there isn't a more appropriate/better way to solve my problem.
>
>
> I have data frame of chromosome/position pairs (along with other data for the location). For each pair I need to determine if it is with in a given data frame of ranges. I need to keep only the pairs that are within any of the ranges for further processing.
>
>
>
> Example:
>
> snps<-NULL
>
> snps$CHR<-c("1","2","2","3","X")
>
> snps$POS<-as.integer(c(295,640,670,100,1100))
>
> snps$DAT<-seq(1:length(snps$CHR))
>
> snps<-as.data.frame(snps, stringsAsFactors=FALSE)
>
>
>
> snps
>
> CHR POS DAT
>
> 1 1 295 1
>
> 2 2 640 2
>
> 3 2 670 3
>
> 4 3 100 4
>
> 5 X 1100 5
>
>
>
> region<-NULL
>
> region$CHR<-c("1","1","2","2","2","X")
>
> region$START<-as.integer(c(10,210,430,650,810,1090))
>
> region$STOP<-as.integer(c(100,350,630,675,850,1111))
>
> region<-as.data.frame(region, stringsAsFactors=FALSE)
>
>
>
> region
>
> CHR START STOP
>
> 1 1 10 100
>
> 2 1 210 350
>
> 3 2 430 630
>
> 4 2 650 675
>
> 5 2 810 850
>
> 6 X 1090 1111
>
>
>
>
>
> The result I need would look like
>
>
>
> Res
>
>
>
> CHR POS DAT
>
> 1 295 1
>
> 2 670 3
>
> X 1100 5
>
>
>
>
>
> My current data set is ~100K snp entries, and my regions table has ~200K entries. I have ~1500 files to go through.
>
>
>
> My current solution is:
>
> library(GenomicRanges)
> snplist<-with(snps, GRanges(CHR, IRanges(POS, POS)))
> locations<-with(region, GRanges(CHR, IRanges(START, STOP)))
> olaps<-findOverlaps(snplist, locations)
>
> then I can easily use olaps to subset as needed. Just trying to see if there are other functions / ways to go about solving this in an effort to learn.
>
> Thanks,
>
> Brian Davis
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list