[Bioc-sig-seq] behavior of XStringSet after c() step
Hervé Pagès
hpages at fhcrc.org
Wed Nov 25 08:13:50 CET 2009
Hi Thomas,
This is fixed in release (Biostrings 2.14.8 / IRanges 1.4.8) and
devel (Biostrings 2.15.9 / IRanges 1.5.10).
In addition to the methods you reported below, I found a few more
methods that were still not supporting XStringSet objects with
a pool of length > 1 (compact() + the coercion methods from an
XStringSet subtype (B/DNA/RNA/AA) to another subtype).
The new versions of Biostrings / IRanges should become available
thru biocLite() in the next 24 hours.
Cheers,
H.
Thomas Girke wrote:
> Hi Hervé,
>
> Thanks for the clarification. Right now this is just a slight inconvenience,
> whereas the support for larger object sizes is a very welcome major
> improvement.
>
> Thanks for doing this.
>
> Thomas
>
>
> On Mon, Nov 23, 2009 at 05:10:44PM -0800, Hervé Pagès wrote:
>> Hi Thomas,
>>
>> The internals of the XStringSet container have changed in BioC 2.5
>> in order to support bigger objects (i.e. objects that can have more
>> than 2^31 letters in them, now this limit is 2^31 letters per element
>> and the maximum nb of elements is 2^31, very much like for
>> standard character vectors) and also to support more efficient
>> combining thru c() or append() (this is now achieved with no copying
>> of the sequence data). The fact that reverseComplement(), reverse(),
>> complement() and chartr() are currently broken on XStringSet objects
>> that have gone thru combining is because of this change in the
>> internals. Most methods that operate on XStringSet objects were adapted
>> except those 4 methods because of lack of time. I'm working on this
>> right now and will post again here when it's fixed. Thanks for the
>> reminder and sorry for the inconvenience.
>>
>> Cheers,
>> H.
>>
>>
>> Thomas Girke wrote:
>>> Dear List,
>>>
>>> Is there an explanation for the behavior change of XStringSet
>>> objects that have gone through an append() or c() step and those
>>> that didn't? I am not observing this problem in the previous
>>> R/BioC release.
>>>
>>> Below is a simple example to reproduce this error.
>>>
>>> Thanks in advance for your help.
>>>
>>> Thomas
>>>
>>> ## Example
>>>> library(Biostrings)
>>>> dset1 <- DNAStringSet(c("GCATATTAC", "AATCGATCC", "GCATATTAC"))
>>>> dset2 <- DNAStringSet(c("CCGCATATTAC", "AAAATCGATCC", "GCATATAATAC"))
>>>> dset3 <- c(dset1, dset2) # using append() doesn't fix the problem
>>>> reverseComplement(dset3)
>>> Error in .local(x, ...) : IRanges internal error: length(x) != 1
>>>
>>>> DNAStringSet(dset3, start=1, end=4)
>>> Error in super(x) : Biostrings internal error: length(x at pool) != 1
>>>
>>> ## The problem goes away by doing the following
>>>> dset3fix <- DNAStringSet(unlist(strsplit(toString(dset3), ", ")))
>>>> reverseComplement(dset3fix)
>>> A DNAStringSet instance of length 6
>>> width seq
>>> [1] 9 GTAATATGC
>>> [2] 9 GGATCGATT
>>> [3] 9 GTAATATGC
>>> [4] 11 GTAATATGCGG
>>> [5] 11 GGATCGATTTT
>>> [6] 11 GTATTATATGC
>>>
>>>
>>>> DNAStringSet(dset3fix, start=1, end=4)
>>> A DNAStringSet instance of length 6
>>> width seq
>>> [1] 4 GCAT
>>> [2] 4 AATC
>>> [3] 4 GCAT
>>> [4] 4 CCGC
>>> [5] 4 AAAA
>>> [6] 4 GCAT
>>>
>>>
>>>> sessionInfo()
>>> R version 2.10.0 (2009-10-26)
>>> x86_64-unknown-linux-gnu
>>>
>>> locale:
>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>>> LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=C
>>> LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>> LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] stats graphics grDevices utils datasets methods base
>>>
>>> other attached packages:
>>> [1] Biostrings_2.14.1 IRanges_1.4.3
>>>
>>> loaded via a namespace (and not attached):
>>> [1] Biobase_2.6.0
>>>
>>> _______________________________________________
>>> Bioc-sig-sequencing mailing list
>>> Bioc-sig-sequencing at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M2-B876
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages at fhcrc.org
>> Phone: (206) 667-5791
>> Fax: (206) 667-1319
>>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioc-sig-sequencing
mailing list