[Bioc-sig-seq] BSgenome getSeq and unstranded GRanges objects
Hervé Pagès
hpages at fhcrc.org
Wed Dec 1 06:46:34 CET 2010
Hi Anna, Michael,
This is already the case if you use the current version of BSgenome:
> library(BSgenome.Mmusculus.UCSC.mm9)
> gr <- GRanges("chr5", IRanges(125821746, 125821945))
> gr
GRanges with 1 range and 0 elementMetadata values
seqnames ranges strand |
<Rle> <IRanges> <Rle> |
[1] chr5 [125821746, 125821945] * |
seqlengths
chr5
NA
> getSeq(Mmusculus, gr)
[1]
"TGAACGCCCTCCACTCAAAATTCCGTGTCCCTCGGGGCCCTTTGCACTTCCCCCACTCGGAATTCCATATCCCTTTGGGCTTTTGCACACCCTCCATTCTGTGTCCCTCGTGGCCTTTGCACACTGTCCCCTCGGAATTCCATGTCTCCCGGGACCTTTGCACACCCTTCGTACAGAATTCTGTGTCCCTCGAGGCCTTA"
> sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C
[3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8
[5] LC_MONETARY=C LC_MESSAGES=en_US.utf8
[7] LC_PAPER=en_US.utf8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] BSgenome.Mmusculus.UCSC.mm9_1.3.16 BSgenome_1.18.2
[3] Biostrings_2.19.1 GenomicRanges_1.3.0
[5] IRanges_1.8.5
loaded via a namespace (and not attached):
[1] Biobase_2.10.0
Cheers,
H.
On 11/30/2010 08:14 PM, Michael Lawrence wrote:
> This is a sensible request, as it would be consistent with the behavior for
> RangesList and RangedData. Anyone opposed to this change?
>
> On Tue, Nov 30, 2010 at 6:43 AM, Anna Terry<anna.terry at csc.mrc.ac.uk>wrote:
>
>> Hi,
>>
>> Would it be possible for the BSgenome function getSeq to return the +
>> strand by default when given a GRanges object where the strand is "*" rather
>> than throw an error? The strand is not known for ChIP-seq regions and so it
>> is sensible to have the strand as "*" when storing them in a GRanges object.
>>
>> Anna
>>
>>> problem.gr
>> GRanges with 1 range and 6 elementMetadata values
>> seqnames ranges strand | score count unique
>> <Rle> <IRanges> <Rle> |<numeric> <integer> <integer>
>> [1] chr5 [125821746, 125821945] * | 97.02651 124 116
>> ref.count height max
>> <integer> <numeric> <numeric>
>> [1] 10 29.03108 125821846
>>
>> seqlengths
>> chr1 chr2 chr3 ... chrY_random chrUn_random
>> 197195432 181748087 159599783 ... 58682461 5900358
>>
>>> problem.seq<- getSeq(Mmusculus, problem.gr)
>> Error in .extractSeqsFromDNAString(subject, ranges(grg), strand(grg)) :
>> 'strand' elements must be "+" or "-"
>> Calls: getSeq ... lapply -> lapply -> FUN -> .extractSeqsFromDNAString
>>
>>> problem.seq<- getSeq(Mmusculus, problem.gr, strand="+")
>> Error in .extractSeqsFromDNAString(subject, ranges(grg), strand(grg)) :
>> 'strand' elements must be "+" or "-"
>> Calls: getSeq ... lapply -> lapply -> FUN -> .extractSeqsFromDNAString
>>
>>> problem.seq<- getSeq(Mmusculus, seqnames(problem.gr), start=start(
>> problem.gr), end=end(problem.gr))
>> # works
>>
>>
>>> sessionInfo()
>> R version 2.11.1 (2010-05-31)
>> x86_64-redhat-linux-gnu
>>
>> locale:
>> [1] LC_CTYPE=en_GB LC_NUMERIC=C LC_TIME=en_GB
>> [4] LC_COLLATE=en_GB LC_MONETARY=en_GB LC_MESSAGES=en_GB
>> [7] LC_PAPER=en_GB LC_NAME=en_GB LC_ADDRESS=en_GB
>> [10] LC_TELEPHONE=en_GB LC_MEASUREMENT=en_GB
>> LC_IDENTIFICATION=en_GB
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>>
>> other attached packages:
>> [1] XML_3.2-0 BSgenome.Mmusculus.UCSC.mm9_1.3.16
>> [3] BSgenome_1.16.5 GenomicRanges_1.0.9
>> [5] Biostrings_2.16.9 IRanges_1.6.17
>> [7] rkward_0.5.3
>>
>> loaded via a namespace (and not attached):
>> [1] Biobase_2.8.0 tools_2.11.1
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioc-sig-sequencing
mailing list