[Bioc-sig-seq] GenomicFeatures, error in type conversion RangeData to GRanges
Patrick Aboyoun
paboyoun at fhcrc.org
Thu Apr 1 21:01:05 CEST 2010
I just checked in a patch to the GenomicRanges package in which the
GRanges constructor will now convert NA values in strand to the
both/either strand indicator "*" and issue a warning to the end-user
that informs them of the change. The updated GenomicRanges package
should be available from bioconductor.org with the next 36 hours. Here
is an example:
> RangedData(IRanges(1,2))
RangedData with 1 row and 0 value columns across 1 space
space ranges |
<character> <IRanges> |
1 1 [1, 2] |
> as(RangedData(IRanges(1,2)), "GRanges")
GRanges with 1 range and 0 elementMetadata values
seqnames ranges strand |
<Rle> <IRanges> <Rle> |
[1] 1 [1, 2] * |
seqlengths
1
NA
Warning message:
In GRanges(seqnames = space(from), ranges = ranges, strand =
Rle(strand(from)), :
missing values in strand converted to "*"
> sessionInfo()
R version 2.11.0 Under development (unstable) (2010-03-22 r51355)
i386-apple-darwin9.8.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] GenomicRanges_0.1.3 IRanges_1.5.74
On 4/1/10 8:04 AM, Michael Lawrence wrote:
> Thinking about this some more, it's somewhat analogous to the coercion to
> factor in R, i.e. as.factor(c("male", "female")) returns something
> reasonable, despite missing level information.
>
> as.factor("male") would probably not be what I wanted, but we live with it,
> since the alternative (requiring the levels argument) would probably be
> worse.
>
> On Thu, Apr 1, 2010 at 7:31 AM, Michael Lawrence<michafla at gene.com> wrote:
>
>
>>
>> On Thu, Apr 1, 2010 at 7:22 AM, Martin Morgan<mtmorgan at fhcrc.org> wrote:
>>
>>
>>> On 04/01/2010 07:12 AM, Michael Lawrence wrote:
>>>
>>>> On Thu, Apr 1, 2010 at 7:09 AM, Martin Morgan<mtmorgan at fhcrc.org>
>>>>
>>> wrote:
>>>
>>>>
>>>>> On 03/31/2010 07:11 PM, pterry at huskers.unl.edu wrote:
>>>>>
>>>>>> Dear bioc-sig-sequencing,
>>>>>>
>>>>>> I would like to annotate chip-seq peaks for the arabidopsis genome.
>>>>>>
>>> In
>>>
>>>>> trying to work thru the GenomicFeatures vignette dated 03/27/10, I need
>>>>>
>>> to
>>>
>>>>> convert my ChIPSeq peaks from a RangedData object to a GRanges object.
>>>>>
>>> In a
>>>
>>>>> recent, but previous Bioconductor development version, the conversion
>>>>>
>>> with
>>>
>>>>> this particular RangedData object worked fine.
>>>>>
>>>>>> In this more recent Bioconductor development version, I get the
>>>>>>
>>> following
>>>
>>>>> error message:
>>>>>
>>>>>>
>>>>>>> gr_ChSeqPks<- as(rd0_chr1_s_8_trt_vs_INPctl, "GRanges")
>>>>>>>
>>>>>> Error in validObject(.Object) :
>>>>>> invalid class "GRanges" object: slot 'strand' contains missing
>>>>>>
>>> values
>>>
>>>>>>> rd0_chr1_s_8_trt_vs_INPctl
>>>>>>>
>>>>>> RangedData with 57 rows and 2 value columns across 1 space
>>>>>> space ranges | ARAB8 ARAB7INPCTL
>>>>>> <character> <IRanges> |<integer> <integer>
>>>>>> 1 chr1 [ 617092, 617094] | 24 0
>>>>>> 2 chr1 [1808262, 1808262] | 8 0
>>>>>> 3 chr1 [3889445, 3889452] | 64 0
>>>>>> 4 chr1 [4404410, 4404410] | 8 0
>>>>>> 5 chr1 [7081127, 7081127] | 8 0
>>>>>> 6 chr1 [7128574, 7128581] | 64 0
>>>>>> 7 chr1 [7128592, 7128649] | 464 0
>>>>>> 8 chr1 [7530777, 7530781] | 40 0
>>>>>> 9 chr1 [7530784, 7530786] | 24 0
>>>>>> ... ... ... ... ... ...
>>>>>>
>>>>> Hi,
>>>>>
>>>>>
>>>>>> rd = RangedData(IRanges(1, 10))
>>>>>> as(rd, "GRanges")
>>>>>>
>>>>> Error in validObject(.Object) :
>>>>> invalid class "GRanges" object: slot 'strand' contains missing values
>>>>>
>>>>>> rd[["strand"]] = "*"
>>>>>> as(rd, "GRanges")
>>>>>>
>>>>> GRanges with 1 range and 0 elementMetadata values
>>>>> seqnames ranges strand |
>>>>> <Rle> <IRanges> <Rle> |
>>>>> [1] 1 [1, 10] * |
>>>>>
>>>>> seqlengths
>>>>> 1
>>>>> NA
>>>>>
>>>>> Martin
>>>>>
>>>>>
>>>>>
>>>> Shouldn't the coerce function just do this automatically?
>>>>
>>> Currently GRanges thinks of strand as '+', '-', '*', whereas IRanges
>>> allows NA as well (hence the error) so coercing NA to * represents a
>>> decision on the part of the investigator that '*' (strand irrelevant) is
>>> synonymous with NA (no information about strand available). Part of the
>>> motivation for this current state of affairs is that the use case for
>>> both NA and * were unclear, but course corrections welcome.
>>>
>>>
>>>
>> Ok. I guess one could think of the coercion of a RangedData missing a
>> 'strand' column to a GRanges as an equivalent statement, since GRanges
>> requires strand information. If that doesn't sound reasonable, a better
>> error message will help avoid questions like this in the future.
>>
>> Michael
>>
>>
>>
>>
>>
>>> Martin
>>>
>>>>
>>>>>>
>>>>>>> sessionInfo()
>>>>>>>
>>>>>> R version 2.12.0 Under development (unstable) (2010-03-30 r51506)
>>>>>> x86_64-unknown-linux-gnu
>>>>>>
>>>>>> locale:
>>>>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>>>>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>>>>>> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
>>>>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>>>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>>>>
>>>>>> attached base packages:
>>>>>> [1] stats graphics grDevices utils datasets methods base
>>>>>>
>>>>>> other attached packages:
>>>>>> [1] biomaRt_2.3.5 GenomicFeatures_0.5.0 GenomicRanges_0.1.0
>>>>>> [4] IRanges_1.5.73
>>>>>>
>>>>>> loaded via a namespace (and not attached):
>>>>>> [1] Biobase_2.7.5 Biostrings_2.15.26 BSgenome_1.15.20 DBI_0.2-5
>>>>>> [5] RCurl_1.3-1 RSQLite_0.8-4 rtracklayer_1.7.11
>>>>>>
>>> tools_2.12.0
>>>
>>>>>> [9] XML_2.8-1
>>>>>>
>>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> P. Terry
>>>>>> pterry at huskers.unl.edu
>>>>>>
>>>>>> [[alternative HTML version deleted]]
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioc-sig-sequencing mailing list
>>>>>> Bioc-sig-sequencing at r-project.org
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>>>>>
>>>>>
>>>>> --
>>>>> Martin Morgan
>>>>> Computational Biology / Fred Hutchinson Cancer Research Center
>>>>> 1100 Fairview Ave. N.
>>>>> PO Box 19024 Seattle, WA 98109
>>>>>
>>>>> Location: Arnold Building M1 B861
>>>>> Phone: (206) 667-2793
>>>>>
>>>>> _______________________________________________
>>>>> Bioc-sig-sequencing mailing list
>>>>> Bioc-sig-sequencing at r-project.org
>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>>>>
>>>>>
>>>>
>>>
>>> --
>>> Martin Morgan
>>> Computational Biology / Fred Hutchinson Cancer Research Center
>>> 1100 Fairview Ave. N.
>>> PO Box 19024 Seattle, WA 98109
>>>
>>> Location: Arnold Building M1 B861
>>> Phone: (206) 667-2793
>>>
>>>
>>
>>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>
More information about the Bioc-sig-sequencing
mailing list