[Bioc-sig-seq] getting integers from SimpleRleViewsList
Janet Young
jayoung at fhcrc.org
Wed Jun 9 02:10:36 CEST 2010
Hi,
This is probably a simple question, but I can't quite figure it out
for myself. I've been using RangedData (etc) quite a lot and have
often found myself looping through the spaces when I probably don't
need to. I'm trying not to do that now...
I have some scores on a bunch of genomic intervals I'm interested in
(they're conservation scores in 4kb promoter regions, e.g. phastCons
scores). I've stored these as a SimpleRleList object.
I also have some smaller subregions (matches to transcription factor
motifs) and I want to compare the distribution of scores inside those
subregions to scores outside those subregions. I have the regions as
RangedData objects, over multiple spaces. I want to do this on
individual base pair scores, rather than by taking subregion means and
comparing those (so I don't want to use viewSums).
I've managed to get the scores inside and outside the regions using
the RleViewsList function (following the example in this thread: http://thread.gmane.org/gmane.comp.lang.r.sequencing/403/focus=408
). The scores are now as SimpleRleViewsList.
I'm currently looping through the scores (the SimpleRleViewsList)
space by space, converting to simple vector and concatenating the
scores, but I imagine there's a way to do it without looping. Any ideas?
Below is a simple example (extended from the older thread). My real
scores are not integers and there's many more of them.
#########################
library(IRanges)
### generate some arbitrary scores
track <- RangedData(RangesList(chrA = IRanges(start = c(1, 4, 6),
width=c(3, 2, 4)),chrB = IRanges(start = c(1, 3, 6), width=c(3, 3,
4))) )
trackCoverage <- coverage(track,
weight=list(chrA=c(2,7,3),chrB=c(1,1,1)) )
### define subregions
exons <- RangesList(chrA = IRanges(start = c(2, 4), width = c(2,
2)),chrB = IRanges(start = 3, width = 5))
###get exon and intron scores
exonView <- RleViewsList(rleList=trackCoverage, rangesList=exons)
intronView <- RleViewsList(rleList=trackCoverage,
rangesList=gaps(ranges(RangedData(exons)),start=1,end=9))
### now convert those to simple integer objects. Loop through
chromosomes. Can I do this without looping?
### (I tried using unlist, but it does something weird to the scores,
as it assumes they're all from the same space)
exonScores <- integer()
intronScores <- integer()
for (chr in names(exonView) ) {
thischrexonscores <- as.integer(seqselect(trackCoverage[[chr]],
start=start(exonView[[chr]]),end=end(exonView[[chr]]) ))
exonScores <- c(exonScores,thischrexonscores)
thischrintronscores <-
as.integer(seqselect(trackCoverage[[chr]],
start=start(intronView[[chr]]),end=end(intronView[[chr]]) ))
intronScores <- c(intronScores,thischrintronscores)
}
###and do something with the scores to compare distributions
wilcox.test(exonScores,intronScores)
#########################
thanks,
Janet
-------------------------------------------------------------------
Dr. Janet Young (Trask lab)
Fred Hutchinson Cancer Research Center
1100 Fairview Avenue N., C3-168,
P.O. Box 19024, Seattle, WA 98109-1024, USA.
tel: (206) 667 1471 fax: (206) 667 6524
email: jayoung ...at... fhcrc.org
http://www.fhcrc.org/labs/trask/
More information about the Bioc-sig-sequencing
mailing list