[Bioc-sig-seq] Coverage for RangeList?
Patrick Aboyoun
paboyoun at fhcrc.org
Tue Apr 20 05:24:49 CEST 2010
Jane,
In BioC 2.5 (release) / R 2.10 and soon to be released BioC 2.6 / R
2.11, the man page for coverage in the IRanges package describes the
shift and width arguments, which allow you to vary the bounds for each
chromosome/contig within the GRanges/RangesList/RangedData object. As
Paul mentioned, the source of the data can influence how you choose to
represent your data so it is tricky to speak in generalities. If you
provide a workflow begining with the data source, we can provide you
code that should work for you.
Cheers,
Patrick
On 4/19/10 6:54 PM, Paul Leo wrote:
>
> There are a couple of how too's about this but briefly it
>
> Say you go a bed file of reads of "whatever you want to get the
> coverage of" ... the bed file is a dataframe with start, end ,and
> chromosome info:
> Otherwise you can use ShortRead to get the RangeData object of reads...
> whatever you a a rangeList or rangeData object of what you are
> counting..
>
>
> Turn the bed file, (as a dataframe) into a Ranges Data object: You can
> use a RangeList if you want ...
>
>
> rl<-RangedData(IRanges(start=bed[,"start"],end=bed[,"end"]),space=bed[,"chr"])
> ________________________________________
>
> So this has chromosomes in the space section..
>
> You the want coverage on a set of "other regions".... say the exons,
> chromosomes, Cpg islands ... whatever:
>
> Eg for exons:
> library(GenomicFeatures.Mmusculus.UCSC.mm9)
> library(org.Mm.eg.db)
>
> genes<-geneMouse()
> the.exons<-exons_deprecated(genes)
>
> "the.exons" is not a ranged data object: (a RangeList can also be used)
> of UCSC exons or name you own..
>
> the.exons<- RangesList(
> chr10 =
> IRanges(start=starts[starts[,"chrom"]=="chr10","txStart"],end=starts[starts[,"chrom"]=="chr10","txStart"]),
> chr11 =
> IRanges(start=starts[starts[,"chrom"]=="chr11","txStart"],end=starts[starts[,"chrom"]=="chr11","txStart"]),
> chr12 =
> IRanges(start=starts[starts[,"chrom"]=="chr12","txStart"],end=starts[starts[,"chrom"]=="chr12","txStart"]),
> ....)
>
> so in summary the.exons is a rangeData or Rangelist objuct of "the
> regions you want the coverage on" (al las start and end of
> lapply(ranges(RangedData),coverage, start = 1, end = 1000)
>
>
> ________________________________________-
>
> ## do coverage calculation
>
> library(BSgenome.Mmusculus.UCSC.mm9)
> mouse.chromlens<-seqlengths(Mmusculus)
>
> Get a rleList objust of the coverage over the genome:
>
> cov<-coverage(rl,width=mouse.chromlens) ## list names are the chromosome
> names/spcae names in rl...you don't have to use "width"
> ##
>
> cov<-cov[names(the.exons)] # make sure list in the same order can this not checked in the line below please check the chromosome names are consistentin rl and the.exons
>
> you now have the coverge on all the chromosomes.... but for other specified regions use:
>
>
> regionViews<- RleViewsList(rleList = cov, rangesList = ranges(the.exons)) ###
>
>
> ## your done!
> ____________________________________________
>
> ##dumP the reults as a dataframe::
> the.counts<-{}
> order.chromos<-names(the.exons)
> for (i in 1:length(order.chromos)){
> chromo<-order.chromos[i]
> the.counts<-rbind(the.counts,cbind(chromo,start(the.exons[chromo]),end(the.exons[chromo]), viewSums(regionViews[[chromo]]) ))
> }
> colnames(the.counts)<- c("chr","start","end","count")
>
> Cheers
> Paul
>
>
>
>
>
> -----Original Message-----
> From: Jane Landolin<landolin at gmail.com>
> To: bioc-sig-sequencing at r-project.org
> Subject: [Bioc-sig-seq] Coverage for RangeList?
> Date: Mon, 19 Apr 2010 16:40:54 -0700
>
>
> Hi all,
>
> I just joined the list, and was hoping to get some advice about the
> best way to work with the various classes that deal with genomic range
> info. I am a postdoc in the Celniker Lab at Lawrence Berkeley National
> Lab, working on the modENCODE project.
>
> So my question has to do with using various functions that operate on
> the IRanges level, while keeping my data in RangedData or RangeList
> objects. I should be able to use rdapply or lapply, but it is not
> very clear to me when I should use one over another.
>
> If I want to calculate the chromosome-by-chromosome coverage for a
> RangeData object, I can do something like:
>
> lapply(ranges(RangedData),coverage, start = 1, end = 1000)
>
> but how can I do it for varying starts and ends?
>
> Alternatively, is it better to write a function and call it using
> rdapply? What have you guys been doing for this scenario?
>
>
> Thanks in advance,
> Jane Landolin
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>
More information about the Bioc-sig-sequencing
mailing list