[Bioc-sig-seq] AlignedRead and complex subsetting

Martin Morgan mtmorgan at fhcrc.org
Wed Sep 2 19:01:39 CEST 2009


Ivan Gregoretti wrote:
> Hello Steve, Martin and Everybody,
> 
> First, thank you both for your help.
> 
> As suggested by Steve, I look at
> 
> https://stat.ethz.ch/pipermail/bioc-sig-sequencing/2009-August/000509.html
> 
> by it didn't quite applied to my case. I need to iterate over
> chromosomes and I need to keep strand information.
> 
> I'll try this new algorithm Martin just wrote. By the way, Martin,
> what kind of object should rangesList be?

an instance of the RangesList class

> Say that I load my regions like this
> 
> myRegions <- import('myregions.bed')
> 
> Can I coerce that IRanges instance myRegions to the rangestList class? How?

I think ranges(myRegions).

I've forgotten what your sessionInfo() is; mine is

> sessionInfo()
R version 2.10.0 Under development (unstable) (2009-09-01 r49517)
x86_64-unknown-linux-gnu

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] ShortRead_1.3.29   lattice_0.17-25    BSgenome_1.13.10
Biostrings_2.13.34
[5] IRanges_1.3.65     rtracklayer_1.5.12 RCurl_1.1-0        bitops_1.0-4.1

loaded via a namespace (and not attached):
[1] Biobase_2.5.5 grid_2.10.0   hwriter_1.1   tools_2.10.0  XML_2.6-0


Martin

> Thank you!
> 
> Ivan
> 
> 
> Ivan Gregoretti, PhD
> National Institute of Diabetes and Digestive and Kidney Diseases
> National Institutes of Health
> 5 Memorial Dr, Building 5, Room 205.
> Bethesda, MD 20892. USA.
> Phone: 1-301-496-1592
> Fax: 1-301-496-9878
> 
> 
> 
> On Tue, Sep 1, 2009 at 7:41 PM, Martin Morgan<mtmorgan at fhcrc.org> wrote:
>> Hi Ivan --
>>
>> Steve Lianoglou wrote:
>>> Hi Ivan,
>>>
>>> On Aug 31, 2009, at 4:54 PM, Ivan Gregoretti wrote:
>>>
>>>> Hello Everybody,
>>>>
>>>> How do you subset an AlignedRead instance to keep (or reject) tags
>>>> that lay within a set of genomic regions?
>>>>
>>>>
>>>> Example
>>>>
>>>> Lets say that I have an AlignedRead instance called aln.
>>>>
>>>> Now let's say that I have a set of positions in BED style:
>>>>
>>>> (chromosome, start end)
>>>> ch1 1000000 1000050
>>>> chrX 20000000 20100000
>>>> ...(many more)...
>>>>
>>>> We can imagine that I have the BED set loaded as a data frame.
>>>>
>>>> Is it possible to pick from aln only the tags within (or outside) the
>>>> features defined in the table described above?
>>> I think that you should convert your BED file to an IRanges object, and
>>> use overlap with your ranges + your readAligned object to get what your
>> I wrote this, as a trial implementation of %in%. Is that useful? (also
>> !(... %in% ...) though that would, e.g., include NAs.
>>
>>
>> setMethod("%in%", c("AlignedRead", "RangesList"),
>>          function(x, table)
>> {
>>    ## consider only sensible alignemnts
>>    chr <- chromosome(x)
>>    pos <- position(x)
>>    wd <- width(x)
>>    notNA <- !(is.na(chr) | is.na(pos) | is.na(wd))
>>    ## find overlap
>>    chr <- chr[notNA]
>>    rl <- RangesList(mapply(IRanges, start=split(pos[notNA], chr),
>>                            width=split(wd[notNA], chr)))
>>    olap <- rl %in% table
>>    ## map to original indicies
>>    len <- seq_len(length(x))
>>    idx <- unlist(split(len[notNA], chr), use.names=FALSE)
>>    len %in% idx[unlist(olap)]
>> })
>>
>> One would then
>>
>>  aln[aln %in% rangesList]
>>
>> Martin
>>
>>> after. See Martin's post about something like this in this thread:
>>>
>>> https://stat.ethz.ch/pipermail/bioc-sig-sequencing/2009-August/000509.html
>>>
>>> To get the reads *outside* of your ranges, maybe you can call the
>>> ``gaps`` on your bed/ranges and then do the same thing ...  or perhaps
>>> ``setdiff(ranges, aln)`` might work, too? (where aln is your IRanges
>>> converted alignedRead object (if necessary)).
>>>
>>> -steve
>>>
>>> --
>>> Steve Lianoglou
>>> Graduate Student: Computational Systems Biology
>>>   |  Memorial Sloan-Kettering Cancer Center
>>>   |  Weill Medical College of Cornell University
>>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>>>
>>> _______________________________________________
>>> Bioc-sig-sequencing mailing list
>>> Bioc-sig-sequencing at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>
> 
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing



More information about the Bioc-sig-sequencing mailing list