[Bioc-sig-seq] GRanges, failure assigning chromosome lengths

Martin Morgan mtmorgan at fhcrc.org
Sat Sep 4 00:28:34 CEST 2010


On 09/03/2010 03:07 PM, Chris Seidel wrote:
> Did anything ever get resolved in terms of assigning chromosome lengths
> to a GRanges object when it contains alignments that run off the
> chromosome ends? The message below was the last of the original thread
> that I could find.
>
> I'm currently having the problem of reading solexa export files into a
> GRanges object, and then sometimes having an error while setting the
> chromosome lengths if the object has a few reads that are past the
> boundary. The only solution I see is to somehow toss out the offending
> reads - which means I have to write a complicated function to loop
> through all reads and check them against the chromosome length - so I
> was just wondering since Ivan brought this problem up back in April, if
> a solution was ever reached. (or if anyone knows of an efficient way to
> address the problem).
>
There is also this thread

https://stat.ethz.ch/pipermail/bioconductor/2010-August/034876.html

It might be as easy as gr0 = as(aln, "GRanges"); gr = gr0[gr0 %in%
seqs], where seqs is a RangesList constructed from the chromosome lengths.

A common source for this problem is mapping to mitochondria or other
circular genomes (hence finding overhanging alignments on chrM might be
enough); this is being actively worked on, but is a deeper issue than it
appears at first blush.

Martin
> -Chris
>
>> -----Original Message-----
>> From: bioc-sig-sequencing-bounces at r-project.org 
>> [mailto:bioc-sig-sequencing-bounces at r-project.org] On Behalf 
>> Of Patrick Aboyoun
>> Sent: Tuesday, April 27, 2010 12:39 PM
>> To: Sean Davis
>> Cc: bioc-sig-sequencing at r-project.org
>> Subject: Re: [Bioc-sig-seq] GRanges, failure assigning 
>> chromosome lengths
>>
>>
>> Sean and Ivan,
>> Thanks for the insight. I'll look at devising a compromise within the 
>> existing framework. I need to explore the various methods for GRanges 
>> object to better understand the impact of a compromise. We 
>> started with 
>> the simplest interpretation of limit bounds because it simplifies the 
>> code. For example, we need to establish the rules for coverage or 
>> findOverlaps when the DNA is circular or the alignment runs 
>> off the end 
>> of a linear chromosome.
>>
>>
>> Patrick
>>
>>
>> On 4/27/10 8:05 AM, Sean Davis wrote:
>>> On Tue, Apr 27, 2010 at 10:51 AM, Ivan 
>> Gregoretti<ivangreg at gmail.com>  
>>> wrote:
>>>    
>>>> Good morning Sean and everybody,
>>>>
>>>>      
>>>>> Actually, the edge case is general as alignments, even on linear 
>>>>> chromosomes, may extend beyond the end of the chromosome, 
>> I believe. 
>>>>> In the best case, these alignments are clipped (in CIGAR 
>> terms), but 
>>>>> I don't know that all aligners are doing that appropriately.
>>>>>
>>>>> Sean
>>>>>        
>>>> So, you rather go for an overriding switch rather than 
>> infrastructure 
>>>> overhaul?
>>>>
>>>> I ask this because GRanges is an exceptionally convenient 
>> format for 
>>>> ChIP-seqers and Patrick is trying to make a decision to 
>> make it work 
>>>> for real world data.
>>>>      
>>> I guess that I mean to say that the two issues of aligning 
>> off the end 
>>> of the chromosome and handling circular genomes are related but 
>>> separate issues.  An override seems quite reasonable for 
>> dealing with 
>>> the former.  Until aligners or common formats (BAM/SAM) 
>> deal with the 
>>> latter, it will be difficult to deal appropriately with circular 
>>> genomes, so an override is probably a fine compromise.
>>>
>>> Sean
>>>
>>>
>>>    
>>>> And yes indeed: aligners do align a little bit past the boundaries 
>>>> even for linear chromosomes. Thanks for pointing that out!
>>>>
>>>> Ivan
>>>>
>>>>
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list 
>> Bioc-sig-sequencing at r-project.org 
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>
>>
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing



More information about the Bioc-sig-sequencing mailing list