[Bioc-sig-seq] BSgenome getSeq and unstranded GRanges objects

Anna Terry anna.terry at csc.mrc.ac.uk
Wed Dec 1 15:52:54 CET 2010


Hi Hervé, Michael,

That's great! I did try searching the message boards, but couldn't find 
anything.

Thanks for the help
Anna

On 01/12/10 05:46, Hervé Pagès wrote:
> Hi Anna, Michael,
>
> This is already the case if you use the current version of BSgenome:
>
>     >  library(BSgenome.Mmusculus.UCSC.mm9)
>
>     >  gr<- GRanges("chr5", IRanges(125821746, 125821945))
>
>     >  gr
>     GRanges with 1 range and 0 elementMetadata values
>         seqnames                 ranges strand |
>            <Rle>               <IRanges>   <Rle>  |
>     [1]     chr5 [125821746, 125821945]      * |
>
>     seqlengths
>      chr5
>        NA
>
>     >  getSeq(Mmusculus, gr)
>     [1]
> "TGAACGCCCTCCACTCAAAATTCCGTGTCCCTCGGGGCCCTTTGCACTTCCCCCACTCGGAATTCCATATCCCTTTGGGCTTTTGCACACCCTCCATTCTGTGTCCCTCGTGGCCTTTGCACACTGTCCCCTCGGAATTCCATGTCTCCCGGGACCTTTGCACACCCTTCGTACAGAATTCTGTGTCCCTCGAGGCCTTA"
>
>     >  sessionInfo()
>     R version 2.12.0 (2010-10-15)
>     Platform: x86_64-unknown-linux-gnu (64-bit)
>
>     locale:
>      [1] LC_CTYPE=en_US.utf8       LC_NUMERIC=C
>      [3] LC_TIME=en_US.utf8        LC_COLLATE=en_US.utf8
>      [5] LC_MONETARY=C             LC_MESSAGES=en_US.utf8
>      [7] LC_PAPER=en_US.utf8       LC_NAME=C
>      [9] LC_ADDRESS=C              LC_TELEPHONE=C
>     [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
>
>     attached base packages:
>     [1] stats     graphics  grDevices utils     datasets  methods   base
>
>     other attached packages:
>     [1] BSgenome.Mmusculus.UCSC.mm9_1.3.16 BSgenome_1.18.2
>     [3] Biostrings_2.19.1                  GenomicRanges_1.3.0
>     [5] IRanges_1.8.5
>
>     loaded via a namespace (and not attached):
>     [1] Biobase_2.10.0
>
> Cheers,
> H.
>
>
> On 11/30/2010 08:14 PM, Michael Lawrence wrote:
>    
>> This is a sensible request, as it would be consistent with the behavior for
>> RangesList and RangedData. Anyone opposed to this change?
>>
>> On Tue, Nov 30, 2010 at 6:43 AM, Anna Terry<anna.terry at csc.mrc.ac.uk>wrote:
>>
>>      
>>> Hi,
>>>
>>> Would it be possible for the BSgenome function getSeq to return the +
>>> strand by default when given a GRanges object where the strand is "*" rather
>>> than throw an error?  The strand is not known for ChIP-seq regions and so it
>>> is sensible to have the strand as "*" when storing them in a GRanges object.
>>>
>>> Anna
>>>
>>>        
>>>> problem.gr
>>>>          
>>> GRanges with 1 range and 6 elementMetadata values
>>>      seqnames                 ranges strand |     score     count    unique
>>> <Rle>   <IRanges>   <Rle>   |<numeric>   <integer>   <integer>
>>> [1]     chr5 [125821746, 125821945]      * |  97.02651       124       116
>>>      ref.count    height       max
>>> <integer>   <numeric>   <numeric>
>>> [1]        10  29.03108 125821846
>>>
>>> seqlengths
>>>           chr1         chr2         chr3 ...  chrY_random chrUn_random
>>>      197195432    181748087    159599783 ...     58682461      5900358
>>>
>>>        
>>>> problem.seq<- getSeq(Mmusculus, problem.gr)
>>>>          
>>> Error in .extractSeqsFromDNAString(subject, ranges(grg), strand(grg)) :
>>>    'strand' elements must be "+" or "-"
>>> Calls: getSeq ... lapply ->   lapply ->   FUN ->   .extractSeqsFromDNAString
>>>
>>>        
>>>> problem.seq<- getSeq(Mmusculus, problem.gr, strand="+")
>>>>          
>>> Error in .extractSeqsFromDNAString(subject, ranges(grg), strand(grg)) :
>>>    'strand' elements must be "+" or "-"
>>> Calls: getSeq ... lapply ->   lapply ->   FUN ->   .extractSeqsFromDNAString
>>>
>>>        
>>>> problem.seq<- getSeq(Mmusculus, seqnames(problem.gr), start=start(
>>>>          
>>> problem.gr), end=end(problem.gr))
>>> # works
>>>
>>>
>>>        
>>>> sessionInfo()
>>>>          
>>> R version 2.11.1 (2010-05-31)
>>> x86_64-redhat-linux-gnu
>>>
>>> locale:
>>>    [1] LC_CTYPE=en_GB          LC_NUMERIC=C            LC_TIME=en_GB
>>>    [4] LC_COLLATE=en_GB        LC_MONETARY=en_GB       LC_MESSAGES=en_GB
>>>    [7] LC_PAPER=en_GB          LC_NAME=en_GB           LC_ADDRESS=en_GB
>>> [10] LC_TELEPHONE=en_GB      LC_MEASUREMENT=en_GB
>>>    LC_IDENTIFICATION=en_GB
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>
>>> other attached packages:
>>> [1] XML_3.2-0                          BSgenome.Mmusculus.UCSC.mm9_1.3.16
>>> [3] BSgenome_1.16.5                    GenomicRanges_1.0.9
>>> [5] Biostrings_2.16.9                  IRanges_1.6.17
>>> [7] rkward_0.5.3
>>>
>>> loaded via a namespace (and not attached):
>>> [1] Biobase_2.8.0 tools_2.11.1
>>>
>>> _______________________________________________
>>> Bioc-sig-sequencing mailing list
>>> Bioc-sig-sequencing at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>>
>>>        
>> 	[[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>      
>
>    

-- 
Anna Terry
Lymphocyte Development
MRC Clinical Sciences Centre
Imperial College Faculty of Medicine
Hammersmith Hospital
Du Cane Road
London W12 0NN

Tel:	0208 3832140
	0208 3832145
Email: anna.terry at csc.mrc.ac.uk



More information about the Bioc-sig-sequencing mailing list