[Bioc-sig-seq] getSeq with space names as factors vs characters
Janet Young
jayoung at fhcrc.org
Tue Nov 16 03:32:40 CET 2010
sorry - I'd meant to include this too:
> sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: i386-apple-darwin9.8.0/i386 (32-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] BSgenome.Mmusculus.UCSC.mm9_1.3.16 BSgenome_1.18.1
[3] Biostrings_2.18.0 GenomicRanges_1.2.1
[5] IRanges_1.8.2
loaded via a namespace (and not attached):
[1] Biobase_2.10.0 tools_2.12.0
Janet
On Nov 15, 2010, at 6:30 PM, Janet Young wrote:
> Hi,
>
> I just updated R and to 2.12.0 and BioC to the corresponding latest
> version.
>
> I've found some new maybe weird behavior in getSeq (Biostrings)
> that's causing a little chaos for me using my code with the updated
> BioC. I think I can find a workaround but am also hoping getSeq
> might be fixable fairly easily?
>
> Here's my issue: I'm using getSeq to extract multiple sequences at
> once from the mouse genome, specifying coordinates using RangedData
> objects. That works OK if I use the whole RangedData object, but
> weird things start to happen if I just use subsets of the RangedData
> object (something to do with factors versus characters for space
> names, perhaps, or the function is getting confused with GRanges vs
> RangedData?).
>
> library(BSgenome.Mmusculus.UCSC.mm9)
> library(IRanges)
>
> tempRD <-
> RangedData
> (IRanges
> (start
> =
> c(10000001,10000001),end=c(10000051,10000051)),space=c("chr1","chr2"))
>
> #### simple getSeq looks good
> getSeq(Mmusculus,tempRD)
> [1] "CTCTTACGTTTTATTCCCTCTTTATCTCAGCTTAGATCAGGGTAAACTTTC"
> [2] "AGGCCAACTTTTAGAGGTTGGCTCTCTCCTTCAATTGCATGTCCAGGGAGC"
>
> ### but if I subset the RangedData it doesn't look so good - I'd
> like the following command to give me just one sequence for the
> first region specified in tempRD, but instead it gives me that first
> region two times
> getSeq(Mmusculus,tempRD[1,])
> [1] "CTCTTACGTTTTATTCCCTCTTTATCTCAGCTTAGATCAGGGTAAACTTTC"
> [2] "CTCTTACGTTTTATTCCCTCTTTATCTCAGCTTAGATCAGGGTAAACTTTC"
>
> ### also if I have unused space names I get an error
>
> tempRD3 <-
> RangedData
> (IRanges
> (start
> =
> c
> (10000001,10000001,10000001
> ),end
> =
> c
> (10000051,10000051,10000051
> )),space=as.character(c("chr1","chr2","chr3")) )
>
> ######
> tempRD4 <- tempRD3[1:2,]
>
> getSeq(Mmusculus,tempRD4)
>
> Error in validObject(.Object) :
> invalid class "GRanges" object: slot lengths are not all equal
> In addition: Warning message:
> In newCompressedList("CompressedSplitDataFrameList", x, splitFactor
> = f, :
> data length is not a multiple of split variable
>
> ### one possible workaround - get rid of the unused space name
> tempRD5 <-
> RangedData
> (IRanges
> (start(tempRD4),end(tempRD4)),space=as.character(space(tempRD4)))
> getSeq(Mmusculus,tempRD5) #### now this works
>
> #############
>
> Hope that all makes some sense - thanks very much,
>
> Janet
>
>
>
> -------------------------------------------------------------------
>
> Dr. Janet Young (Trask lab)
>
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Avenue N., C3-168,
> P.O. Box 19024, Seattle, WA 98109-1024, USA.
>
> tel: (206) 667 1471 fax: (206) 667 6524
> email: jayoung ...at... fhcrc.org
>
> http://www.fhcrc.org/labs/trask/
>
> -------------------------------------------------------------------
>
More information about the Bioc-sig-sequencing
mailing list