[Bioc-sig-seq] behavior of XStringSet after c() step

Hervé Pagès hpages at fhcrc.org
Wed Nov 25 08:13:50 CET 2009


Hi Thomas,

This is fixed in release (Biostrings 2.14.8 / IRanges 1.4.8) and
devel (Biostrings 2.15.9 / IRanges 1.5.10).
In addition to the methods you reported below, I found a few more
methods that were still not supporting XStringSet objects with
a pool of length > 1 (compact() + the coercion methods from an
XStringSet subtype (B/DNA/RNA/AA) to another subtype).

The new versions of Biostrings / IRanges should become available
thru biocLite() in the next 24 hours.

Cheers,
H.


Thomas Girke wrote:
> Hi Hervé,
> 
> Thanks for the clarification. Right now this is just a slight inconvenience,
> whereas the support for larger object sizes is a very welcome major 
> improvement.
> 
> Thanks for doing this.
> 
> Thomas
> 
> 
> On Mon, Nov 23, 2009 at 05:10:44PM -0800, Hervé Pagès wrote:
>> Hi Thomas,
>>
>> The internals of the XStringSet container have changed in BioC 2.5
>> in order to support bigger objects (i.e. objects that can have more
>> than 2^31 letters in them, now this limit is 2^31 letters per element
>> and the maximum nb of elements is 2^31, very much like for
>> standard character vectors) and also to support more efficient
>> combining thru c() or append() (this is now achieved with no copying
>> of the sequence data). The fact that reverseComplement(), reverse(), 
>> complement() and chartr() are currently broken on XStringSet objects
>> that have gone thru combining is because of this change in the 
>> internals. Most methods that operate on XStringSet objects were adapted
>> except those 4 methods because of lack of time. I'm working on this
>> right now and will post again here when it's fixed. Thanks for the
>> reminder and sorry for the inconvenience.
>>
>> Cheers,
>> H.
>>
>>
>> Thomas Girke wrote:
>>> Dear List,
>>>
>>> Is there an explanation for the behavior change of XStringSet
>>> objects that have gone through an append() or c() step and those
>>> that didn't? I am not observing this problem in the previous 
>>> R/BioC release.
>>>
>>> Below is a simple example to reproduce this error.
>>>
>>> Thanks in advance for your help.
>>>
>>> Thomas
>>>
>>> ## Example
>>>> library(Biostrings)
>>>> dset1 <- DNAStringSet(c("GCATATTAC", "AATCGATCC", "GCATATTAC"))
>>>> dset2 <- DNAStringSet(c("CCGCATATTAC", "AAAATCGATCC", "GCATATAATAC"))
>>>> dset3 <- c(dset1, dset2) # using append() doesn't fix the problem
>>>> reverseComplement(dset3)
>>> Error in .local(x, ...) : IRanges internal error: length(x) != 1
>>>
>>>> DNAStringSet(dset3, start=1, end=4)
>>> Error in super(x) : Biostrings internal error: length(x at pool) != 1
>>>
>>> ## The problem goes away by doing the following
>>>> dset3fix <- DNAStringSet(unlist(strsplit(toString(dset3), ", ")))
>>>> reverseComplement(dset3fix)
>>>  A DNAStringSet instance of length 6
>>>    width seq
>>> [1]     9 GTAATATGC
>>> [2]     9 GGATCGATT
>>> [3]     9 GTAATATGC
>>> [4]    11 GTAATATGCGG
>>> [5]    11 GGATCGATTTT
>>> [6]    11 GTATTATATGC
>>>
>>>
>>>> DNAStringSet(dset3fix, start=1, end=4)
>>>  A DNAStringSet instance of length 6
>>>    width seq
>>> [1]     4 GCAT
>>> [2]     4 AATC
>>> [3]     4 GCAT
>>> [4]     4 CCGC
>>> [5]     4 AAAA
>>> [6]     4 GCAT
>>>
>>>
>>>> sessionInfo()
>>> R version 2.10.0 (2009-10-26)
>>> x86_64-unknown-linux-gnu
>>>
>>> locale:
>>> [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               
>>> LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=C       
>>> LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C
>>> [9] LC_ADDRESS=C               LC_TELEPHONE=C             
>>> LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>
>>> other attached packages:
>>> [1] Biostrings_2.14.1 IRanges_1.4.3
>>>
>>> loaded via a namespace (and not attached):
>>> [1] Biobase_2.6.0
>>>
>>> _______________________________________________
>>> Bioc-sig-sequencing mailing list
>>> Bioc-sig-sequencing at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>> -- 
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M2-B876
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages at fhcrc.org
>> Phone:  (206) 667-5791
>> Fax:    (206) 667-1319
>>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-sig-sequencing mailing list