[Bioc-sig-seq] Extracting DNA sequences from BSgenome.Mmusculus.UCSC.mm9_1.3.11

Hervé Pagès hpages at fhcrc.org
Thu May 28 21:18:44 CEST 2009


Hi Ivan,

Ivan Gregoretti wrote:
> Hello everyone,
> 
> It is very easy to display one sequence of DNA from the mouse genome.
> 
> For example
> 
>> library(BSgenome.Mmusculus.UCSC.mm9)
>> DNAString(Mmusculus$chr1)[100000000:100000050]
>   51-letter "DNAString" instance
> seq: GGACTGCTGTTGCTGATTCATGTTTGATGTTTTAGACTGCTAATATCCTGA
> 
> 
> My question:
> 
> Now lets say I have a BED-like list of genomic spaces like this
> 
>> head(A[ , c("chr", "start", "end")])
>    chr   start     end
> 1 chr1 3644952 3649720
> 2 chr1 4599146 4601342
> 3 chr1 5015865 5018830
> 4 chr1 5072928 5076881
> 5 chr1 5504220 5507065
> 6 chr1 5513886 5516391
> 
> How do I display many sequences from different chromosomes?

   DNAStringSet(sapply(seq_len(nrow(A)),
                       function(i)
                         getSeq(Mmusculus,
                                as.vector(A$chr[i]),
                                start=A$start[i], end=A$end[i])))

I think you have a fairly reasonable use-case here so I'm going to work
of vectorizing getSeq() so you'll be able to do something like:

   getSeq(Mmusculus, as.vector(A$chr), start=A$start, end=A$end)

to get the same thing.

> 
> 
> Another question:
> 
> I wish to add these sequences to my BED-like data.frame as a new
> field. How do I convert them to strings?
> 

Then don't call DNAStringSet() on what's returned by sapply() in the above
code.

> 
> In my defense:
> 
> The first question is not covered in the documentation of
> BSgenome.Mmusculus.UCSC.mm9.

Right. But since this is a BSgenome generic question, a more appropriate
place to cover this is in the doc of the BSgenome package itself. I'll
cover this in ?getSeq.

Thanks for your feedback.

H.

> 
> Thank you,
> 
> Ivan
> 
> 
> Ivan Gregoretti, PhD
> National Institute of Diabetes and Digestive and Kidney Diseases
> National Institutes of Health
> 5 Memorial Dr, Building 5, Room 205.
> Bethesda, MD 20892. USA.
> Phone: 1-301-496-1592
> Fax: 1-301-496-9878
> 
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-sig-sequencing mailing list