[Bioc-sig-seq] rtracklayer BUG!? import.bed pastes/prefixes 'chr' during import of SOME records
Ivan Gregoretti
ivangreg at gmail.com
Tue May 31 23:08:49 CEST 2011
My vote goes to stay conforming to the UCSC conventions.
asRangedData=FALSE can solve the problem of those who need the extra
flexibility.
Ivan
On Tue, May 31, 2011 at 4:54 PM, Michael Lawrence
<lawrence.michael at gene.com> wrote:
> Hi,
>
> Your BED file contains a "track line" which is something that is defined by
> the UCSC Genome Browser. I view (and others might disagree, in fact I often
> disagree with myself about this) that the track lines are specific to UCSC
> and not in the core BED format. Thus when rtracklayer sees a track line, it
> by default attempts to make the chromosomes conform to the UCSC convention.
>
> This can be avoided in a number of ways. First, passing trackLine=FALSE will
> cause the track line to be ignored. Second, passing asRangedData=FALSE will
> cause the return value to be a GRanges, in which case it cannot hold the
> track line data (at least not in any formal way -- it could go into the
> metadata) and thus the track line is ignored.
>
> So those are workarounds. I could be convinced though that presence of a
> track line does NOT imply conformance to UCSC conventions.
>
> I fixed the incomplete conversion in devel. Please let me know if it needs
> to be back-ported.
>
> Michael
>
> On Tue, May 31, 2011 at 12:53 PM, Cook, Malcolm <MEC at stowers.org> wrote:
>
>> I find that rtracklayer's import.bed function is pasting the string 'chr'
>> to SOME of my chromosome names in my bed file ( f, attached)
>>
>> It is as if it is trying to rename (some) of them to (partially) agree with
>> ucsc's naming convention.
>>
>> I would expect import.bed to preserve column 1 as the name of the 'space',
>> but it does not.
>>
>> But... some of the spaces have had a 'chr' prefix added!
>>
>> Look at the session transcipt below!
>>
>> I _think_ this behavior is NEW since I recently upgraded.
>>
>> I hope not to have to revert to previous version of R / bioconductor to
>> address this issue.
>>
>> Any suggestions?
>>
>> Thanks!
>>
>> Malcolm Cook
>> Computational Biology - Stowers Institute for Medical Research
>>
>>
>>
>>
>> > library(rtracklayer)
>> Loading required package: RCurl
>> Loading required package: bitops
>> > unique(read.table('f.bed',sep="\t",skip=1)[,1])
>> [1] YHet dmel_mitochondrion_genome
>> [3] 2L X
>> [5] 3L 4
>> [7] 2R 3R
>> [9] Uextra 2RHet
>> [11] 2LHet 3LHet
>> [13] 3RHet U
>> [15] XHet
>> 15 Levels: 2L 2LHet 2R 2RHet 3L 3LHet 3R 3RHet 4 U Uextra X XHet ...
>> dmel_mitochondrion_genome
>> > unique(space(import.bed ('f.bed')))
>> [1] chr2L 2LHet
>> [3] chr2R 2RHet
>> [5] chr3L 3LHet
>> [7] chr3R 3RHet
>> [9] chr4 chrU
>> [11] Uextra chrX
>> [13] XHet YHet
>> [15] dmel_mitochondrion_genome
>> 15 Levels: chr2L 2LHet chr2R 2RHet chr3L 3LHet chr3R 3RHet chr4 chrU ...
>> dmel_mitochondrion_genome
>> > sessionInfo()
>> R version 2.13.0 (2011-04-13)
>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>>
>> locale:
>> [1] C
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>>
>> other attached packages:
>> [1] rtracklayer_1.12.2 RCurl_1.5-0 bitops_1.0-4.1
>>
>> loaded via a namespace (and not attached):
>> [1] BSgenome_1.20.0 Biostrings_2.20.1 GenomicRanges_1.4.6
>> [4] IRanges_1.10.4 XML_3.4-0
>> >
>>
>>
>>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>
More information about the Bioc-sig-sequencing
mailing list