[Bioc-devel] Patch to GFF3 reader in rtracklayer
Ryan C. Thompson
rct at thompsonclan.org
Mon Aug 27 17:50:03 CEST 2012
Hi all,
I recently found that rtracklayer's GFF3 file read was unable to read
GFF3 files produced by Cufflinks. I tracked the problem down to the
occurrence of equals signs in tag values. For example, the following
line was problematic:
C123300344 Cufflinks transcript 1 132 . -
.
ID=TCONS_00000337;geneID=XLOC_000337;oId=ENSMMUP00000032229;nearest_ref=ENSMMUP00000032229;class_code==;tss_id=TSS337;p_id=P1
due to the "class_code==" part (the value of the class code is actually
an equals sign). Obviously the bug occurs because "strsplit" doesn't
stop after the first split, but keeps splitting at subsequent
occurrences of the separator. I have modified the reader to be able to
handle this case, which as far as I know is perfectly valid. Instead of
strsplit, I use regexpr to find only the *first* occurrence of an equals
sign, and then I use substr to extract the part of the tag before and
after the equals sign. The attached file is a patch against "R/gff.R" in
the rtracklayer dist. I developed the patch against version 1.16.1.
Regards,
-Ryan Thompson
More information about the Bioc-devel
mailing list