[Bioc-sig-seq] getSeq with space names as factors vs characters
Janet Young
jayoung at fhcrc.org
Tue Nov 16 03:30:48 CET 2010
Hi,
I just updated R and to 2.12.0 and BioC to the corresponding latest
version.
I've found some new maybe weird behavior in getSeq (Biostrings) that's
causing a little chaos for me using my code with the updated BioC. I
think I can find a workaround but am also hoping getSeq might be
fixable fairly easily?
Here's my issue: I'm using getSeq to extract multiple sequences at
once from the mouse genome, specifying coordinates using RangedData
objects. That works OK if I use the whole RangedData object, but weird
things start to happen if I just use subsets of the RangedData object
(something to do with factors versus characters for space names,
perhaps, or the function is getting confused with GRanges vs
RangedData?).
library(BSgenome.Mmusculus.UCSC.mm9)
library(IRanges)
tempRD <-
RangedData
(IRanges
(start
=c(10000001,10000001),end=c(10000051,10000051)),space=c("chr1","chr2"))
#### simple getSeq looks good
getSeq(Mmusculus,tempRD)
[1] "CTCTTACGTTTTATTCCCTCTTTATCTCAGCTTAGATCAGGGTAAACTTTC"
[2] "AGGCCAACTTTTAGAGGTTGGCTCTCTCCTTCAATTGCATGTCCAGGGAGC"
### but if I subset the RangedData it doesn't look so good - I'd like
the following command to give me just one sequence for the first
region specified in tempRD, but instead it gives me that first region
two times
getSeq(Mmusculus,tempRD[1,])
[1] "CTCTTACGTTTTATTCCCTCTTTATCTCAGCTTAGATCAGGGTAAACTTTC"
[2] "CTCTTACGTTTTATTCCCTCTTTATCTCAGCTTAGATCAGGGTAAACTTTC"
### also if I have unused space names I get an error
tempRD3 <-
RangedData
(IRanges
(start
=
c
(10000001,10000001,10000001
),end
=
c
(10000051,10000051,10000051
)),space=as.character(c("chr1","chr2","chr3")) )
######
tempRD4 <- tempRD3[1:2,]
getSeq(Mmusculus,tempRD4)
Error in validObject(.Object) :
invalid class "GRanges" object: slot lengths are not all equal
In addition: Warning message:
In newCompressedList("CompressedSplitDataFrameList", x, splitFactor =
f, :
data length is not a multiple of split variable
### one possible workaround - get rid of the unused space name
tempRD5 <-
RangedData
(IRanges
(start(tempRD4),end(tempRD4)),space=as.character(space(tempRD4)))
getSeq(Mmusculus,tempRD5) #### now this works
#############
Hope that all makes some sense - thanks very much,
Janet
-------------------------------------------------------------------
Dr. Janet Young (Trask lab)
Fred Hutchinson Cancer Research Center
1100 Fairview Avenue N., C3-168,
P.O. Box 19024, Seattle, WA 98109-1024, USA.
tel: (206) 667 1471 fax: (206) 667 6524
email: jayoung ...at... fhcrc.org
http://www.fhcrc.org/labs/trask/
More information about the Bioc-sig-sequencing
mailing list