[Bioc-sig-seq] behavior of XStringSet after c() step

Thomas Girke thomas.girke at ucr.edu
Tue Nov 24 02:36:59 CET 2009


Hi Hervé,

Thanks for the clarification. Right now this is just a slight inconvenience,
whereas the support for larger object sizes is a very welcome major 
improvement.

Thanks for doing this.

Thomas


On Mon, Nov 23, 2009 at 05:10:44PM -0800, Hervé Pagès wrote:
> Hi Thomas,
> 
> The internals of the XStringSet container have changed in BioC 2.5
> in order to support bigger objects (i.e. objects that can have more
> than 2^31 letters in them, now this limit is 2^31 letters per element
> and the maximum nb of elements is 2^31, very much like for
> standard character vectors) and also to support more efficient
> combining thru c() or append() (this is now achieved with no copying
> of the sequence data). The fact that reverseComplement(), reverse(), 
> complement() and chartr() are currently broken on XStringSet objects
> that have gone thru combining is because of this change in the 
> internals. Most methods that operate on XStringSet objects were adapted
> except those 4 methods because of lack of time. I'm working on this
> right now and will post again here when it's fixed. Thanks for the
> reminder and sorry for the inconvenience.
> 
> Cheers,
> H.
> 
> 
> Thomas Girke wrote:
> >Dear List,
> >
> >Is there an explanation for the behavior change of XStringSet
> >objects that have gone through an append() or c() step and those
> >that didn't? I am not observing this problem in the previous 
> >R/BioC release.
> >
> >Below is a simple example to reproduce this error.
> >
> >Thanks in advance for your help.
> >
> >Thomas
> >
> >## Example
> >>library(Biostrings)
> >>dset1 <- DNAStringSet(c("GCATATTAC", "AATCGATCC", "GCATATTAC"))
> >>dset2 <- DNAStringSet(c("CCGCATATTAC", "AAAATCGATCC", "GCATATAATAC"))
> >>dset3 <- c(dset1, dset2) # using append() doesn't fix the problem
> >
> >>reverseComplement(dset3)
> >Error in .local(x, ...) : IRanges internal error: length(x) != 1
> >
> >>DNAStringSet(dset3, start=1, end=4)
> >Error in super(x) : Biostrings internal error: length(x at pool) != 1
> >
> >## The problem goes away by doing the following
> >>dset3fix <- DNAStringSet(unlist(strsplit(toString(dset3), ", ")))
> >
> >>reverseComplement(dset3fix)
> >  A DNAStringSet instance of length 6
> >    width seq
> >[1]     9 GTAATATGC
> >[2]     9 GGATCGATT
> >[3]     9 GTAATATGC
> >[4]    11 GTAATATGCGG
> >[5]    11 GGATCGATTTT
> >[6]    11 GTATTATATGC
> >
> >
> >>DNAStringSet(dset3fix, start=1, end=4)
> >  A DNAStringSet instance of length 6
> >    width seq
> >[1]     4 GCAT
> >[2]     4 AATC
> >[3]     4 GCAT
> >[4]     4 CCGC
> >[5]     4 AAAA
> >[6]     4 GCAT
> >
> >
> >>sessionInfo()
> >R version 2.10.0 (2009-10-26)
> >x86_64-unknown-linux-gnu
> >
> >locale:
> > [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               
> > LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=C       
> > LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C
> > [9] LC_ADDRESS=C               LC_TELEPHONE=C             
> > LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> >
> >attached base packages:
> >[1] stats     graphics  grDevices utils     datasets  methods   base
> >
> >other attached packages:
> >[1] Biostrings_2.14.1 IRanges_1.4.3
> >
> >loaded via a namespace (and not attached):
> >[1] Biobase_2.6.0
> >
> >_______________________________________________
> >Bioc-sig-sequencing mailing list
> >Bioc-sig-sequencing at r-project.org
> >https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
> 
> -- 
> Hervé Pagès
> 
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M2-B876
> P.O. Box 19024
> Seattle, WA 98109-1024
> 
> E-mail: hpages at fhcrc.org
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319
>



More information about the Bioc-sig-sequencing mailing list