[Bioc-devel] Patch to GFF3 reader in rtracklayer
Ryan C. Thompson
rct at thompsonclan.org
Fri Aug 31 02:17:31 CEST 2012
So anyway, now that we've figured out the problem of sending the patch,
can we discuss the merits of the patch itself? Currently I have to apply
this patch to every update of rtracklayer in order to make it read GFF
files generated by Cufflinks. I've been happily using rtracklayer with
this patch applied for months.
On 08/27/2012 10:24 AM, Ryan C. Thompson wrote:
> Ok, I put the patch in a Github gist, since the list seems to not like
> patch as an attachment:
>
> https://gist.github.com/3490557
>
> On 08/27/2012 09:32 AM, Ryan C. Thompson wrote:
>> It looks like the attachment was scrubbed from my initial message.
>> Here is another attempt to send it.
>>
>> On Mon 27 Aug 2012 08:50:03 AM PDT, Ryan C. Thompson wrote:
>>> Hi all,
>>>
>>> I recently found that rtracklayer's GFF3 file read was unable to read
>>> GFF3 files produced by Cufflinks. I tracked the problem down to the
>>> occurrence of equals signs in tag values. For example, the following
>>> line was problematic:
>>>
>>> C123300344 Cufflinks transcript 1 132 . -
>>> .
>>> ID=TCONS_00000337;geneID=XLOC_000337;oId=ENSMMUP00000032229;nearest_ref=ENSMMUP00000032229;class_code==;tss_id=TSS337;p_id=P1
>>>
>>>
>>>
>>> due to the "class_code==" part (the value of the class code is
>>> actually an equals sign). Obviously the bug occurs because "strsplit"
>>> doesn't stop after the first split, but keeps splitting at subsequent
>>> occurrences of the separator. I have modified the reader to be able to
>>> handle this case, which as far as I know is perfectly valid. Instead
>>> of strsplit, I use regexpr to find only the *first* occurrence of an
>>> equals sign, and then I use substr to extract the part of the tag
>>> before and after the equals sign. The attached file is a patch against
>>> "R/gff.R" in the rtracklayer dist. I developed the patch against
>>> version 1.16.1.
>>>
>>> Regards,
>>>
>>> -Ryan Thompson
>>>
>>>
>
More information about the Bioc-devel
mailing list