[Bioc-sig-seq] behavior of XStringSet after c() step

Sun Nov 22 00:08:45 CET 2009

Dear List,

Is there an explanation for the behavior change of XStringSet
objects that have gone through an append() or c() step and those
that didn't? I am not observing this problem in the previous 
R/BioC release.

Below is a simple example to reproduce this error.

Thanks in advance for your help.

Thomas

## Example
> library(Biostrings)
> dset1 <- DNAStringSet(c("GCATATTAC", "AATCGATCC", "GCATATTAC"))
> dset2 <- DNAStringSet(c("CCGCATATTAC", "AAAATCGATCC", "GCATATAATAC"))
> dset3 <- c(dset1, dset2) # using append() doesn't fix the problem

> reverseComplement(dset3)
Error in .local(x, ...) : IRanges internal error: length(x) != 1

> DNAStringSet(dset3, start=1, end=4)
Error in super(x) : Biostrings internal error: length(x at pool) != 1

## The problem goes away by doing the following
> dset3fix <- DNAStringSet(unlist(strsplit(toString(dset3), ", ")))

> reverseComplement(dset3fix)
  A DNAStringSet instance of length 6
    width seq
[1]     9 GTAATATGC
[2]     9 GGATCGATT
[3]     9 GTAATATGC
[4]    11 GTAATATGCGG
[5]    11 GGATCGATTTT
[6]    11 GTATTATATGC

> DNAStringSet(dset3fix, start=1, end=4)
  A DNAStringSet instance of length 6
    width seq
[1]     4 GCAT
[2]     4 AATC
[3]     4 GCAT
[4]     4 CCGC
[5]     4 AAAA
[6]     4 GCAT

> sessionInfo()
R version 2.10.0 (2009-10-26)
x86_64-unknown-linux-gnu

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] Biostrings_2.14.1 IRanges_1.4.3

loaded via a namespace (and not attached):
[1] Biobase_2.6.0