[Bioc-sig-seq] BSgenome getSeq and unstranded GRanges objects
Anna Terry
anna.terry at csc.mrc.ac.uk
Wed Dec 1 15:52:54 CET 2010
Hi Hervé, Michael,
That's great! I did try searching the message boards, but couldn't find
anything.
Thanks for the help
Anna
On 01/12/10 05:46, Hervé Pagès wrote:
> Hi Anna, Michael,
>
> This is already the case if you use the current version of BSgenome:
>
> > library(BSgenome.Mmusculus.UCSC.mm9)
>
> > gr<- GRanges("chr5", IRanges(125821746, 125821945))
>
> > gr
> GRanges with 1 range and 0 elementMetadata values
> seqnames ranges strand |
> <Rle> <IRanges> <Rle> |
> [1] chr5 [125821746, 125821945] * |
>
> seqlengths
> chr5
> NA
>
> > getSeq(Mmusculus, gr)
> [1]
> "TGAACGCCCTCCACTCAAAATTCCGTGTCCCTCGGGGCCCTTTGCACTTCCCCCACTCGGAATTCCATATCCCTTTGGGCTTTTGCACACCCTCCATTCTGTGTCCCTCGTGGCCTTTGCACACTGTCCCCTCGGAATTCCATGTCTCCCGGGACCTTTGCACACCCTTCGTACAGAATTCTGTGTCCCTCGAGGCCTTA"
>
> > sessionInfo()
> R version 2.12.0 (2010-10-15)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C
> [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8
> [5] LC_MONETARY=C LC_MESSAGES=en_US.utf8
> [7] LC_PAPER=en_US.utf8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] BSgenome.Mmusculus.UCSC.mm9_1.3.16 BSgenome_1.18.2
> [3] Biostrings_2.19.1 GenomicRanges_1.3.0
> [5] IRanges_1.8.5
>
> loaded via a namespace (and not attached):
> [1] Biobase_2.10.0
>
> Cheers,
> H.
>
>
> On 11/30/2010 08:14 PM, Michael Lawrence wrote:
>
>> This is a sensible request, as it would be consistent with the behavior for
>> RangesList and RangedData. Anyone opposed to this change?
>>
>> On Tue, Nov 30, 2010 at 6:43 AM, Anna Terry<anna.terry at csc.mrc.ac.uk>wrote:
>>
>>
>>> Hi,
>>>
>>> Would it be possible for the BSgenome function getSeq to return the +
>>> strand by default when given a GRanges object where the strand is "*" rather
>>> than throw an error? The strand is not known for ChIP-seq regions and so it
>>> is sensible to have the strand as "*" when storing them in a GRanges object.
>>>
>>> Anna
>>>
>>>
>>>> problem.gr
>>>>
>>> GRanges with 1 range and 6 elementMetadata values
>>> seqnames ranges strand | score count unique
>>> <Rle> <IRanges> <Rle> |<numeric> <integer> <integer>
>>> [1] chr5 [125821746, 125821945] * | 97.02651 124 116
>>> ref.count height max
>>> <integer> <numeric> <numeric>
>>> [1] 10 29.03108 125821846
>>>
>>> seqlengths
>>> chr1 chr2 chr3 ... chrY_random chrUn_random
>>> 197195432 181748087 159599783 ... 58682461 5900358
>>>
>>>
>>>> problem.seq<- getSeq(Mmusculus, problem.gr)
>>>>
>>> Error in .extractSeqsFromDNAString(subject, ranges(grg), strand(grg)) :
>>> 'strand' elements must be "+" or "-"
>>> Calls: getSeq ... lapply -> lapply -> FUN -> .extractSeqsFromDNAString
>>>
>>>
>>>> problem.seq<- getSeq(Mmusculus, problem.gr, strand="+")
>>>>
>>> Error in .extractSeqsFromDNAString(subject, ranges(grg), strand(grg)) :
>>> 'strand' elements must be "+" or "-"
>>> Calls: getSeq ... lapply -> lapply -> FUN -> .extractSeqsFromDNAString
>>>
>>>
>>>> problem.seq<- getSeq(Mmusculus, seqnames(problem.gr), start=start(
>>>>
>>> problem.gr), end=end(problem.gr))
>>> # works
>>>
>>>
>>>
>>>> sessionInfo()
>>>>
>>> R version 2.11.1 (2010-05-31)
>>> x86_64-redhat-linux-gnu
>>>
>>> locale:
>>> [1] LC_CTYPE=en_GB LC_NUMERIC=C LC_TIME=en_GB
>>> [4] LC_COLLATE=en_GB LC_MONETARY=en_GB LC_MESSAGES=en_GB
>>> [7] LC_PAPER=en_GB LC_NAME=en_GB LC_ADDRESS=en_GB
>>> [10] LC_TELEPHONE=en_GB LC_MEASUREMENT=en_GB
>>> LC_IDENTIFICATION=en_GB
>>>
>>> attached base packages:
>>> [1] stats graphics grDevices utils datasets methods base
>>>
>>> other attached packages:
>>> [1] XML_3.2-0 BSgenome.Mmusculus.UCSC.mm9_1.3.16
>>> [3] BSgenome_1.16.5 GenomicRanges_1.0.9
>>> [5] Biostrings_2.16.9 IRanges_1.6.17
>>> [7] rkward_0.5.3
>>>
>>> loaded via a namespace (and not attached):
>>> [1] Biobase_2.8.0 tools_2.11.1
>>>
>>> _______________________________________________
>>> Bioc-sig-sequencing mailing list
>>> Bioc-sig-sequencing at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>>
>>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>
>
>
--
Anna Terry
Lymphocyte Development
MRC Clinical Sciences Centre
Imperial College Faculty of Medicine
Hammersmith Hospital
Du Cane Road
London W12 0NN
Tel: 0208 3832140
0208 3832145
Email: anna.terry at csc.mrc.ac.uk
More information about the Bioc-sig-sequencing
mailing list