[Bioc-sig-seq] GRanges, failure assigning chromosome lengths
Patrick Aboyoun
paboyoun at fhcrc.org
Mon Apr 26 22:35:45 CEST 2010
Ivan,
Could you provide me with the results of
range(Z)
for all three of the GRanges objects that seqlengths<- throws and error
on? I would like to see if there is some adjustment we can make to the
seqlengths<- function that will resolve your issue.
Thanks,
Patrick
On 4/26/10 12:57 PM, Ivan Gregoretti wrote:
> Hi Patrick,
>
> You are correct. The validity check is rejecting the input. The
> problem with that is that I have three set, all of them failing perhap
> because one or two tags are out of bounds.
>
> This tags are coming straight from the Illumina sequencer. There is
> no manipulation. Perhaps it is complaining because the validity check
> does not know that Mitochondrial DNA is circular.
>
> Bottom line, anybody attempting to load the data as GRanges from a
> high coverage tag set will find this seqlengths() rejection.
>
> Is there any way to override it or should I just refrain from
> assigning chromosome lengths? The online vignette does not mention if
> there is a switch to the validity check.
>
> Thank you,
>
> Ivan
>
>
>
> Ivan Gregoretti, PhD
> National Institute of Diabetes and Digestive and Kidney Diseases
> National Institutes of Health
> 5 Memorial Dr, Building 5, Room 205.
> Bethesda, MD 20892. USA.
> Phone: 1-301-496-1592
> Fax: 1-301-496-9878
>
>
>
> On Mon, Apr 26, 2010 at 3:42 PM, Patrick Aboyoun<paboyoun at fhcrc.org> wrote:
>
>> Ivan,
>> As you probably already realized, the first error you encountered was do to
>> a misuse of the seqlengths function since function objects (e.g. BSgenome)
>> have no sequence lengths.
>>
>> # Here is the problem
>> seqlengths(Z)<- seqlengths(BSgenome)[names(seqlengths(Z))]
>> Error in function (classes, fdef, mtable) :
>> unable to find an inherited method for function "seqlengths", for
>> signature "function"
>>
>> The second error is the result of a failed validity check on the modified
>> object. All ranges stored in a GRanges object must be between 1 and
>> seqlengths(object) if the seqlengths information is non-NAs.
>>
>> # Here is a second attempt that also fails
>> seqlengths(Z)<- seqlengths(Mmusculus)[names(seqlengths(Z))]
>> Error in validObject(.Object) :
>> invalid class "GRanges" object: slot 'ranges' contains values
>> outside of sequence bounds
>>
>> Compare the results of range(Z), which returns the min start and max end for
>> each of the seqnames in Z, and compare it with
>> seqlengths(Mmusculus)[names(seqlengths(Z))]. This should provide you with
>> some insight as to which ranges are out of bounds. Perhaps your intervals
>> are 0-based instead of 1-based?
>>
>>
>> Cheers,
>> Patrick
>>
>>
>> On 4/26/10 11:08 AM, Ivan Gregoretti wrote:
>>
>>> Hello listers,
>>>
>>> Is anybody having trouble assigning seqlengths() to a GRanges instance
>>> with the new GenomicRanges version?
>>>
>>>
>>> This morning I upgraded my GenomicRanges from 0.0.9 to 0.1.17 and
>>> since then I am unable to assign chromosome lengths to any of my tag
>>> sets from my Illumina 36 nucleotide sequences.
>>>
>>> On Friday this worked. Let me show you how it complains now:
>>>
>>> Z<- import('millionsoftags.bed.gz', 'bed')
>>>
>>> Z<- as(Z, 'GRanges')
>>>
>>> class(Z)
>>> [1] "GRanges"
>>> attr(,"package")
>>> [1] "GenomicRanges"
>>>
>>> Z
>>> GRanges with 23293177 ranges and 2 elementMetadata values
>>> seqnames ranges strand |
>>> <Rle> <IRanges> <Rle> |
>>> [1] chr1 [3000506, 3000541] + |
>>> [2] chr1 [3001061, 3001096] - |
>>> [3] chr1 [3001075, 3001110] - |
>>> [4] chr1 [3001098, 3001133] + |
>>> [5] chr1 [3001310, 3001345] + |
>>> [6] chr1 [3001559, 3001594] + |
>>> [7] chr1 [3001603, 3001638] + |
>>> [8] chr1 [3001603, 3001638] + |
>>> [9] chr1 [3001609, 3001644] - |
>>> ... ... ... ... ...
>>> [23293169] chrY_random [58402685, 58402720] + |
>>> [23293170] chrY_random [58403358, 58403393] + |
>>> [23293171] chrY_random [58406154, 58406189] + |
>>> [23293172] chrY_random [58411077, 58411112] - |
>>> [23293173] chrY_random [58430677, 58430712] + |
>>> [23293174] chrY_random [58435117, 58435152] - |
>>> [23293175] chrY_random [58472079, 58472114] + |
>>> [23293176] chrY_random [58483725, 58483760] - |
>>> [23293177] chrY_random [58487952, 58487987] - |
>>> name score
>>> <character> <numeric>
>>> [1] HWI-EAS179_1:7:39:506:1302 96
>>> [2] HWI-EAS179_1:2:69:562:1539 119
>>> [3] HWI-EAS179_1:8:28:1327:394 119
>>> [4] HWI-EAS179_1:7:96:619:454 119
>>> [5] HWI-EAS179_49:3:4:1219:1729 119
>>> [6] HWI-EAS179_49:3:88:949:558 118
>>> [7] HWI-EAS179_1:7:60:1151:1790 119
>>> [8] HWI-EAS179_1:7:61:1586:147 114
>>> [9] HWI-EAS179_1:7:55:813:365 106
>>> ... ... ...
>>> [23293169] HWI-EAS179_1:7:49:1416:1573 17
>>> [23293170] HWI-EAS179_1:8:25:405:1723 59
>>> [23293171] HWI-EAS179_1:7:75:1366:1224 25
>>> [23293172] HWI-EAS179_1:2:5:1338:80 5
>>> [23293173] HWI-EAS179_49:3:13:151:166 83
>>> [23293174] HWI-EAS179_49:3:29:1091:472 6
>>> [23293175] HWI-EAS179_1:2:69:1424:733 17
>>> [23293176] HWI-EAS179_1:7:16:945:1051 25
>>> [23293177] HWI-EAS179_1:7:74:1117:1801 14
>>>
>>> seqlengths
>>> chr1 chr10 chr11 ... chrY chrY_random
>>> NA NA NA ... NA NA
>>>
>>> # Here is the problem
>>> seqlengths(Z)<- seqlengths(BSgenome)[names(seqlengths(Z))]
>>> Error in function (classes, fdef, mtable) :
>>> unable to find an inherited method for function "seqlengths", for
>>> signature "function"
>>>
>>> # Here is a second attempt that also fails
>>> seqlengths(Z)<- seqlengths(Mmusculus)[names(seqlengths(Z))]
>>> Error in validObject(.Object) :
>>> invalid class "GRanges" object: slot 'ranges' contains values
>>> outside of sequence bounds
>>>
>>>
>>> As you can see, I haven't had the chance to mess the data.
>>> Any idea how to circumvent this problem?
>>>
>>> Thank you,
>>>
>>> Ivan
>>>
>>>
>>> sessionInfo()
>>> R version 2.12.0 Under development (unstable) (2010-03-25 r51410)
>>> x86_64-unknown-linux-gnu
>>>
>>> locale:
>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>>> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] stats graphics grDevices utils datasets methods base
>>>
>>> other attached packages:
>>> [1] BSgenome.Mmusculus.UCSC.mm9_1.3.16 BSgenome_1.15.21
>>> [3] Biostrings_2.15.27 GenomicRanges_0.1.17
>>> [5] IRanges_1.5.79 rtracklayer_1.7.12
>>> [7] RCurl_1.4-1 bitops_1.0-4.1
>>>
>>> loaded via a namespace (and not attached):
>>> [1] Biobase_2.7.6 tools_2.12.0 XML_2.8-1
>>>
>>>
>>> Ivan Gregoretti, PhD
>>> National Institute of Diabetes and Digestive and Kidney Diseases
>>> National Institutes of Health
>>>
>>> _______________________________________________
>>> Bioc-sig-sequencing mailing list
>>> Bioc-sig-sequencing at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>>
>>>
>>
>>
More information about the Bioc-sig-sequencing
mailing list