[Bioc-devel] Patch to GFF3 reader in rtracklayer
Ryan C. Thompson
rct at thompsonclan.org
Mon Aug 27 19:24:16 CEST 2012
Ok, I put the patch in a Github gist, since the list seems to not like
patch as an attachment:
https://gist.github.com/3490557
On 08/27/2012 09:32 AM, Ryan C. Thompson wrote:
> It looks like the attachment was scrubbed from my initial message.
> Here is another attempt to send it.
>
> On Mon 27 Aug 2012 08:50:03 AM PDT, Ryan C. Thompson wrote:
>> Hi all,
>>
>> I recently found that rtracklayer's GFF3 file read was unable to read
>> GFF3 files produced by Cufflinks. I tracked the problem down to the
>> occurrence of equals signs in tag values. For example, the following
>> line was problematic:
>>
>> C123300344 Cufflinks transcript 1 132 . -
>> .
>> ID=TCONS_00000337;geneID=XLOC_000337;oId=ENSMMUP00000032229;nearest_ref=ENSMMUP00000032229;class_code==;tss_id=TSS337;p_id=P1
>>
>>
>>
>> due to the "class_code==" part (the value of the class code is
>> actually an equals sign). Obviously the bug occurs because "strsplit"
>> doesn't stop after the first split, but keeps splitting at subsequent
>> occurrences of the separator. I have modified the reader to be able to
>> handle this case, which as far as I know is perfectly valid. Instead
>> of strsplit, I use regexpr to find only the *first* occurrence of an
>> equals sign, and then I use substr to extract the part of the tag
>> before and after the equals sign. The attached file is a patch against
>> "R/gff.R" in the rtracklayer dist. I developed the patch against
>> version 1.16.1.
>>
>> Regards,
>>
>> -Ryan Thompson
>>
>>
More information about the Bioc-devel
mailing list