[Bioc-sig-seq] GRanges, failure assigning chromosome lengths
Martin Morgan
mtmorgan at fhcrc.org
Sat Sep 4 00:28:34 CEST 2010
On 09/03/2010 03:07 PM, Chris Seidel wrote:
> Did anything ever get resolved in terms of assigning chromosome lengths
> to a GRanges object when it contains alignments that run off the
> chromosome ends? The message below was the last of the original thread
> that I could find.
>
> I'm currently having the problem of reading solexa export files into a
> GRanges object, and then sometimes having an error while setting the
> chromosome lengths if the object has a few reads that are past the
> boundary. The only solution I see is to somehow toss out the offending
> reads - which means I have to write a complicated function to loop
> through all reads and check them against the chromosome length - so I
> was just wondering since Ivan brought this problem up back in April, if
> a solution was ever reached. (or if anyone knows of an efficient way to
> address the problem).
>
There is also this thread
https://stat.ethz.ch/pipermail/bioconductor/2010-August/034876.html
It might be as easy as gr0 = as(aln, "GRanges"); gr = gr0[gr0 %in%
seqs], where seqs is a RangesList constructed from the chromosome lengths.
A common source for this problem is mapping to mitochondria or other
circular genomes (hence finding overhanging alignments on chrM might be
enough); this is being actively worked on, but is a deeper issue than it
appears at first blush.
Martin
> -Chris
>
>> -----Original Message-----
>> From: bioc-sig-sequencing-bounces at r-project.org
>> [mailto:bioc-sig-sequencing-bounces at r-project.org] On Behalf
>> Of Patrick Aboyoun
>> Sent: Tuesday, April 27, 2010 12:39 PM
>> To: Sean Davis
>> Cc: bioc-sig-sequencing at r-project.org
>> Subject: Re: [Bioc-sig-seq] GRanges, failure assigning
>> chromosome lengths
>>
>>
>> Sean and Ivan,
>> Thanks for the insight. I'll look at devising a compromise within the
>> existing framework. I need to explore the various methods for GRanges
>> object to better understand the impact of a compromise. We
>> started with
>> the simplest interpretation of limit bounds because it simplifies the
>> code. For example, we need to establish the rules for coverage or
>> findOverlaps when the DNA is circular or the alignment runs
>> off the end
>> of a linear chromosome.
>>
>>
>> Patrick
>>
>>
>> On 4/27/10 8:05 AM, Sean Davis wrote:
>>> On Tue, Apr 27, 2010 at 10:51 AM, Ivan
>> Gregoretti<ivangreg at gmail.com>
>>> wrote:
>>>
>>>> Good morning Sean and everybody,
>>>>
>>>>
>>>>> Actually, the edge case is general as alignments, even on linear
>>>>> chromosomes, may extend beyond the end of the chromosome,
>> I believe.
>>>>> In the best case, these alignments are clipped (in CIGAR
>> terms), but
>>>>> I don't know that all aligners are doing that appropriately.
>>>>>
>>>>> Sean
>>>>>
>>>> So, you rather go for an overriding switch rather than
>> infrastructure
>>>> overhaul?
>>>>
>>>> I ask this because GRanges is an exceptionally convenient
>> format for
>>>> ChIP-seqers and Patrick is trying to make a decision to
>> make it work
>>>> for real world data.
>>>>
>>> I guess that I mean to say that the two issues of aligning
>> off the end
>>> of the chromosome and handling circular genomes are related but
>>> separate issues. An override seems quite reasonable for
>> dealing with
>>> the former. Until aligners or common formats (BAM/SAM)
>> deal with the
>>> latter, it will be difficult to deal appropriately with circular
>>> genomes, so an override is probably a fine compromise.
>>>
>>> Sean
>>>
>>>
>>>
>>>> And yes indeed: aligners do align a little bit past the boundaries
>>>> even for linear chromosomes. Thanks for pointing that out!
>>>>
>>>> Ivan
>>>>
>>>>
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>
>>
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
More information about the Bioc-sig-sequencing
mailing list