[Bioc-sig-seq] BSgenome getSeq and unstranded GRanges objects

Anna Terry anna.terry at csc.mrc.ac.uk
Tue Nov 30 15:43:00 CET 2010


Hi,

Would it be possible for the BSgenome function getSeq to return the + 
strand by default when given a GRanges object where the strand is "*" 
rather than throw an error?  The strand is not known for ChIP-seq 
regions and so it is sensible to have the strand as "*" when storing 
them in a GRanges object.

Anna

 > problem.gr
GRanges with 1 range and 6 elementMetadata values
     seqnames                 ranges strand |     score     count    unique
<Rle> <IRanges> <Rle> | <numeric> <integer> <integer>
[1]     chr5 [125821746, 125821945]      * |  97.02651       124       116
     ref.count    height       max
<integer> <numeric> <numeric>
[1]        10  29.03108 125821846

seqlengths
          chr1         chr2         chr3 ...  chrY_random chrUn_random
     197195432    181748087    159599783 ...     58682461      5900358

 > problem.seq <- getSeq(Mmusculus, problem.gr)
Error in .extractSeqsFromDNAString(subject, ranges(grg), strand(grg)) :
   'strand' elements must be "+" or "-"
Calls: getSeq ... lapply -> lapply -> FUN -> .extractSeqsFromDNAString

 > problem.seq <- getSeq(Mmusculus, problem.gr, strand="+")
Error in .extractSeqsFromDNAString(subject, ranges(grg), strand(grg)) :
   'strand' elements must be "+" or "-"
Calls: getSeq ... lapply -> lapply -> FUN -> .extractSeqsFromDNAString

 > problem.seq <- getSeq(Mmusculus, seqnames(problem.gr), 
start=start(problem.gr), end=end(problem.gr))
# works


 > sessionInfo()
R version 2.11.1 (2010-05-31)
x86_64-redhat-linux-gnu

locale:
  [1] LC_CTYPE=en_GB          LC_NUMERIC=C            LC_TIME=en_GB
  [4] LC_COLLATE=en_GB        LC_MONETARY=en_GB       LC_MESSAGES=en_GB
  [7] LC_PAPER=en_GB          LC_NAME=en_GB           LC_ADDRESS=en_GB
[10] LC_TELEPHONE=en_GB      LC_MEASUREMENT=en_GB    LC_IDENTIFICATION=en_GB

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] XML_3.2-0                          BSgenome.Mmusculus.UCSC.mm9_1.3.16
[3] BSgenome_1.16.5                    GenomicRanges_1.0.9
[5] Biostrings_2.16.9                  IRanges_1.6.17
[7] rkward_0.5.3

loaded via a namespace (and not attached):
[1] Biobase_2.8.0 tools_2.11.1



More information about the Bioc-sig-sequencing mailing list