[Bioc-sig-seq] weighted coverage on GRanges object

Hervé Pagès hpages at fhcrc.org
Fri May 13 19:41:46 CEST 2011


Hi Janet,

Not sure it makes sense that the 'weight' arg for the coverage,GRanges
method is required to be a list to start with. Generally speaking, one
would like to be able to associate a weight to each range, and this can
simply be achieved by passing a numeric vector of length the number of
ranges. If the user wants to define the weights on a per-chromosome
basis, then this would better be done before, when preparing the weight
vector, and it's easy.

Anyway, I'll keep support for list but will also add support for numeric
vector.

Thanks for your feedback!
H.


On 11-05-06 12:48 PM, Janet Young wrote:
> Hi there,
>
> I have a suggestion - is it possible to make it a little easier to directly calculate weighted coverage on a GRanges object that has regions on>1 chromosome?   I think the code below explains what I mean.
>
> I've figured out a workaround, but it took me quite a while to get there - perhaps putting the coercion in the underlying coverage/weights code will help others avoid that in future?
>
> thanks very much,
>
> Janet
>
> -------------------------------------------------------------------
>
> Dr. Janet Young (Trask lab)
>
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Avenue N., C3-168,
> P.O. Box 19024, Seattle, WA 98109-1024, USA.
>
> tel: (206) 667 1471 fax: (206) 667 6524
> email: jayoung  ...at...  fhcrc.org
>
> http://www.fhcrc.org/labs/trask/
>
> -------------------------------------------------------------------
>
>
> library(GenomicRanges)
>
>
> dat1<- GRanges(seqnames="chr2L",IRanges(start=((1:4)*5),width=20),score=c(3,5,10,20) )
>
> dat2<- GRanges(seqnames=rep(c("chr2L","chr3L"),c(4,3)),IRanges(start=c(  (1:4)*5 , (1:3)*5 ),width=20),score=c(3,5,10,20,6,10,30) )
> seqlengths(dat2)<- c(50,40)
>
> #### this works
> coverage(dat1,weight=list(elementMetadata(dat1)$score))
>
> ## SimpleRleList of length 1
> ## $chr2L
> ## 'integer' Rle of length 39 with 8 runs
> ##   Lengths:  4  5  5  5  5  5  5  5
> ##   Values :  0  3  8 18 38 35 30 20
>
>
> #### this works
> myscores<- list(c(3,5,10,20), c(6,10,30))
> coverage(dat2,weight=myscores)
>
> ## SimpleRleList of length 2
> ## $chr2L
> ## 'integer' Rle of length 50 with 9 runs
> ##   Lengths:  4  5  5  5  5  5  5  5 11
> ##   Values :  0  3  8 18 38 35 30 20  0
>
> ## $chr3L
> ## 'integer' Rle of length 40 with 7 runs
> ##   Lengths:  4  5  5 10  5  5  6
> ##   Values :  0  6 16 46 40 30  0
>
> ##### but it'd be great if this could work too....
> coverage(dat2,weight=list(elementMetadata(dat2)$score))
> ## Error in normargWeight(weight, nseq) : 'weight' is longer than 'x'
>
> ### for now I'll coerce my scores into a list by chromosome: does this seem to be a reliable way to do it?  this is the part it took me a while to figure out on my own...
> myscores<- split( elementMetadata(dat2)$score , as.character(seqnames(dat2)) )
> coverage(dat2,weight=myscores)
>
> sessionInfo()
>
> R version 2.13.0 (2011-04-13)
> Platform: i386-apple-darwin9.8.0/i386 (32-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] GenomicRanges_1.4.3 IRanges_1.10.0
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-sig-sequencing mailing list