[Bioc-sig-seq] GRanges, failure assigning chromosome lengths
Ivan Gregoretti
ivangreg at gmail.com
Mon Apr 26 23:17:06 CEST 2010
Hi Patrick.
Sure. Here is the output. To the best of my understanding, most
aligners assume that, for instance, 'ch4_random' is just a linear
continuation of 'chr4'.
Thank you,
Ivan
> range(A)
GRanges with 67 ranges and 0 elementMetadata values
seqnames ranges strand |
<Rle> <IRanges> <Rle> |
[1] chr1 [3000311, 197195093] + |
[2] chr1 [2999944, 197195305] - |
[3] chr10 [3000180, 129992771] + |
[4] chr10 [3000092, 129993250] - |
[5] chr11 [3000006, 121843794] + |
[6] chr11 [3000010, 121843708] - |
[7] chr12 [3000095, 121257390] + |
[8] chr12 [3000144, 121257410] - |
[9] chr13 [3000434, 120284396] + |
... ... ... ... ...
[59] chrUn_random [ -21, 5900319] - |
[60] chrX [3000042, 166627000] + |
[61] chrX [2999944, 166650266] - |
[62] chrX_random [ 326, 1051594] + |
[63] chrX_random [ 229, 1309438] - |
[64] chrY [ 150, 2902454] + |
[65] chrY [ -2, 2902181] - |
[66] chrY_random [ 7539, 58497416] + |
[67] chrY_random [ 1247, 58489658] - |
seqlengths
chr1 chr10 chr11 ... chrY chrY_random
NA NA NA ... NA NA
> range(A[seqnames(A)=='chrM'])
GRanges with 2 ranges and 0 elementMetadata values
seqnames ranges strand |
<Rle> <IRanges> <Rle> |
[1] chrM [ 4, 16387] + |
[2] chrM [-82, 16298] - |
> range(B)
GRanges with 65 ranges and 0 elementMetadata values
seqnames ranges strand |
<Rle> <IRanges> <Rle> |
[1] chr1 [3000506, 197194916] + |
[2] chr1 [3000977, 197194382] - |
[3] chr10 [3000626, 129989507] + |
[4] chr10 [3000643, 129988346] - |
[5] chr11 [3000057, 121843783] + |
[6] chr11 [3000010, 121843699] - |
[7] chr12 [3001370, 121256864] + |
[8] chr12 [3000135, 121257132] - |
[9] chr13 [3000103, 120284379] + |
... ... ... ... ...
[57] chrUn_random [ 125, 5900272] - |
[58] chrX [3001345, 166614519] + |
[59] chrX [3000031, 166628394] - |
[60] chrX_random [ 1177, 1051588] + |
[61] chrX_random [ 254, 1051506] - |
[62] chrY [ 141, 2902458] + |
[63] chrY [ 744, 2901128] - |
[64] chrY_random [ 39555, 58472198] + |
[65] chrY_random [ 71505, 58487987] - |
seqlengths
chr1 chr10 chr11 ... chrY chrY_random
NA NA NA ... NA NA
> range(B[seqnames(B)=='chrM'])
GRanges with 2 ranges and 0 elementMetadata values
seqnames ranges strand |
<Rle> <IRanges> <Rle> |
[1] chrM [ 5, 16385] + |
[2] chrM [-77, 16299] - |
> range(C)
GRanges with 67 ranges and 0 elementMetadata values
seqnames ranges strand |
<Rle> <IRanges> <Rle> |
[1] chr1 [3000125, 197195029] + |
[2] chr1 [2999981, 197194997] - |
[3] chr10 [3000098, 129993293] + |
[4] chr10 [3000105, 129993219] - |
[5] chr11 [3000007, 121843789] + |
[6] chr11 [3000010, 121843707] - |
[7] chr12 [3000113, 121257571] + |
[8] chr12 [3000011, 121257521] - |
[9] chr13 [3000163, 120284384] + |
... ... ... ... ...
[59] chrUn_random [ -75, 5900330] - |
[60] chrX [3000056, 166650160] + |
[61] chrX [2999951, 166650076] - |
[62] chrX_random [ 327, 1200747] + |
[63] chrX_random [ 236, 1508524] - |
[64] chrY [ 38, 2902563] + |
[65] chrY [ -80, 2902221] - |
[66] chrY_random [ 5699, 58492209] + |
[67] chrY_random [ 7154, 58486839] - |
seqlengths
chr1 chr10 chr11 ... chrY chrY_random
NA NA NA ... NA NA
> range(C[seqnames(C)=='chrM'])
GRanges with 2 ranges and 0 elementMetadata values
seqnames ranges strand |
<Rle> <IRanges> <Rle> |
[1] chrM [ 1, 16387] + |
[2] chrM [-81, 16332] - |
library(BSgenome.Mmusculus.UCSC.mm9)
> length(Mmusculus$chrM)
[1] 16299
Ivan Gregoretti, PhD
National Institute of Diabetes and Digestive and Kidney Diseases
National Institutes of Health
On Mon, Apr 26, 2010 at 4:35 PM, Patrick Aboyoun <paboyoun at fhcrc.org> wrote:
> Ivan,
> Could you provide me with the results of
>
> range(Z)
>
> for all three of the GRanges objects that seqlengths<- throws and error on?
> I would like to see if there is some adjustment we can make to the
> seqlengths<- function that will resolve your issue.
>
> Thanks,
> Patrick
>
>
> On 4/26/10 12:57 PM, Ivan Gregoretti wrote:
>>
>> Hi Patrick,
>>
>> You are correct. The validity check is rejecting the input. The
>> problem with that is that I have three set, all of them failing perhap
>> because one or two tags are out of bounds.
>>
>> This tags are coming straight from the Illumina sequencer. There is
>> no manipulation. Perhaps it is complaining because the validity check
>> does not know that Mitochondrial DNA is circular.
>>
>> Bottom line, anybody attempting to load the data as GRanges from a
>> high coverage tag set will find this seqlengths() rejection.
>>
>> Is there any way to override it or should I just refrain from
>> assigning chromosome lengths? The online vignette does not mention if
>> there is a switch to the validity check.
>>
>> Thank you,
>>
>> Ivan
>>
>>
>>
>> Ivan Gregoretti, PhD
>> National Institute of Diabetes and Digestive and Kidney Diseases
>> National Institutes of Health
>> 5 Memorial Dr, Building 5, Room 205.
>> Bethesda, MD 20892. USA.
>> Phone: 1-301-496-1592
>> Fax: 1-301-496-9878
>>
>>
>>
>> On Mon, Apr 26, 2010 at 3:42 PM, Patrick Aboyoun<paboyoun at fhcrc.org>
>> wrote:
>>
>>>
>>> Ivan,
>>> As you probably already realized, the first error you encountered was do
>>> to
>>> a misuse of the seqlengths function since function objects (e.g.
>>> BSgenome)
>>> have no sequence lengths.
>>>
>>> # Here is the problem
>>> seqlengths(Z)<- seqlengths(BSgenome)[names(seqlengths(Z))]
>>> Error in function (classes, fdef, mtable) :
>>> unable to find an inherited method for function "seqlengths", for
>>> signature "function"
>>>
>>> The second error is the result of a failed validity check on the modified
>>> object. All ranges stored in a GRanges object must be between 1 and
>>> seqlengths(object) if the seqlengths information is non-NAs.
>>>
>>> # Here is a second attempt that also fails
>>> seqlengths(Z)<- seqlengths(Mmusculus)[names(seqlengths(Z))]
>>> Error in validObject(.Object) :
>>> invalid class "GRanges" object: slot 'ranges' contains values
>>> outside of sequence bounds
>>>
>>> Compare the results of range(Z), which returns the min start and max end
>>> for
>>> each of the seqnames in Z, and compare it with
>>> seqlengths(Mmusculus)[names(seqlengths(Z))]. This should provide you with
>>> some insight as to which ranges are out of bounds. Perhaps your intervals
>>> are 0-based instead of 1-based?
>>>
>>>
>>> Cheers,
>>> Patrick
>>>
>>>
>>> On 4/26/10 11:08 AM, Ivan Gregoretti wrote:
>>>
>>>>
>>>> Hello listers,
>>>>
>>>> Is anybody having trouble assigning seqlengths() to a GRanges instance
>>>> with the new GenomicRanges version?
>>>>
>>>>
>>>> This morning I upgraded my GenomicRanges from 0.0.9 to 0.1.17 and
>>>> since then I am unable to assign chromosome lengths to any of my tag
>>>> sets from my Illumina 36 nucleotide sequences.
>>>>
>>>> On Friday this worked. Let me show you how it complains now:
>>>>
>>>> Z<- import('millionsoftags.bed.gz', 'bed')
>>>>
>>>> Z<- as(Z, 'GRanges')
>>>>
>>>> class(Z)
>>>> [1] "GRanges"
>>>> attr(,"package")
>>>> [1] "GenomicRanges"
>>>>
>>>> Z
>>>> GRanges with 23293177 ranges and 2 elementMetadata values
>>>> seqnames ranges strand |
>>>> <Rle> <IRanges> <Rle> |
>>>> [1] chr1 [3000506, 3000541] + |
>>>> [2] chr1 [3001061, 3001096] - |
>>>> [3] chr1 [3001075, 3001110] - |
>>>> [4] chr1 [3001098, 3001133] + |
>>>> [5] chr1 [3001310, 3001345] + |
>>>> [6] chr1 [3001559, 3001594] + |
>>>> [7] chr1 [3001603, 3001638] + |
>>>> [8] chr1 [3001603, 3001638] + |
>>>> [9] chr1 [3001609, 3001644] - |
>>>> ... ... ... ... ...
>>>> [23293169] chrY_random [58402685, 58402720] + |
>>>> [23293170] chrY_random [58403358, 58403393] + |
>>>> [23293171] chrY_random [58406154, 58406189] + |
>>>> [23293172] chrY_random [58411077, 58411112] - |
>>>> [23293173] chrY_random [58430677, 58430712] + |
>>>> [23293174] chrY_random [58435117, 58435152] - |
>>>> [23293175] chrY_random [58472079, 58472114] + |
>>>> [23293176] chrY_random [58483725, 58483760] - |
>>>> [23293177] chrY_random [58487952, 58487987] - |
>>>> name score
>>>> <character> <numeric>
>>>> [1] HWI-EAS179_1:7:39:506:1302 96
>>>> [2] HWI-EAS179_1:2:69:562:1539 119
>>>> [3] HWI-EAS179_1:8:28:1327:394 119
>>>> [4] HWI-EAS179_1:7:96:619:454 119
>>>> [5] HWI-EAS179_49:3:4:1219:1729 119
>>>> [6] HWI-EAS179_49:3:88:949:558 118
>>>> [7] HWI-EAS179_1:7:60:1151:1790 119
>>>> [8] HWI-EAS179_1:7:61:1586:147 114
>>>> [9] HWI-EAS179_1:7:55:813:365 106
>>>> ... ... ...
>>>> [23293169] HWI-EAS179_1:7:49:1416:1573 17
>>>> [23293170] HWI-EAS179_1:8:25:405:1723 59
>>>> [23293171] HWI-EAS179_1:7:75:1366:1224 25
>>>> [23293172] HWI-EAS179_1:2:5:1338:80 5
>>>> [23293173] HWI-EAS179_49:3:13:151:166 83
>>>> [23293174] HWI-EAS179_49:3:29:1091:472 6
>>>> [23293175] HWI-EAS179_1:2:69:1424:733 17
>>>> [23293176] HWI-EAS179_1:7:16:945:1051 25
>>>> [23293177] HWI-EAS179_1:7:74:1117:1801 14
>>>>
>>>> seqlengths
>>>> chr1 chr10 chr11 ... chrY chrY_random
>>>> NA NA NA ... NA NA
>>>>
>>>> # Here is the problem
>>>> seqlengths(Z)<- seqlengths(BSgenome)[names(seqlengths(Z))]
>>>> Error in function (classes, fdef, mtable) :
>>>> unable to find an inherited method for function "seqlengths", for
>>>> signature "function"
>>>>
>>>> # Here is a second attempt that also fails
>>>> seqlengths(Z)<- seqlengths(Mmusculus)[names(seqlengths(Z))]
>>>> Error in validObject(.Object) :
>>>> invalid class "GRanges" object: slot 'ranges' contains values
>>>> outside of sequence bounds
>>>>
>>>>
>>>> As you can see, I haven't had the chance to mess the data.
>>>> Any idea how to circumvent this problem?
>>>>
>>>> Thank you,
>>>>
>>>> Ivan
>>>>
>>>>
>>>> sessionInfo()
>>>> R version 2.12.0 Under development (unstable) (2010-03-25 r51410)
>>>> x86_64-unknown-linux-gnu
>>>>
>>>> locale:
>>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>>>> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
>>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>>
>>>> attached base packages:
>>>> [1] stats graphics grDevices utils datasets methods base
>>>>
>>>> other attached packages:
>>>> [1] BSgenome.Mmusculus.UCSC.mm9_1.3.16 BSgenome_1.15.21
>>>> [3] Biostrings_2.15.27 GenomicRanges_0.1.17
>>>> [5] IRanges_1.5.79 rtracklayer_1.7.12
>>>> [7] RCurl_1.4-1 bitops_1.0-4.1
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] Biobase_2.7.6 tools_2.12.0 XML_2.8-1
>>>>
>>>>
>>>> Ivan Gregoretti, PhD
>>>> National Institute of Diabetes and Digestive and Kidney Diseases
>>>> National Institutes of Health
>>>>
>>>> _______________________________________________
>>>> Bioc-sig-sequencing mailing list
>>>> Bioc-sig-sequencing at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>>>
>>>>
>>>
>>>
>
>
More information about the Bioc-sig-sequencing
mailing list