[BioC] extracting sequence from a genome
Steve Lianoglou
mailinglist.honeypot at gmail.com
Thu Mar 15 19:40:27 CET 2012
Hi Iain,
On Thu, Mar 15, 2012 at 2:16 PM, Iain Gallagher
<iaingallagher at btopenworld.com> wrote:
>
>
> Hi Gentlemen
>
> Thanks for taking the time. Here's where I am just now:
>
> library(BSgenome.Rnorvegicus.UCSC.rn4)# get genome
> library(GenomicRanges)
>
> fl <- "ftp://mirbase.org/pub/mirbase/CURRENT/genomes/rno.gff" # get miR coords
> gff <- read.table(fl) # as dataframe
>
> names <- gff[,10]
> nms <- gsub(";", "", gsub("\"", "", gsub("ID=\"", "", names))) # a set of nested gsub with regex to leave only the text in the double quotes
>
> gr <- GRanges(seqnames = Rle(paste('chr', gff[,1], sep='')), ranges = IRanges(gff[,4], end = gff[,5], names = nms), strand = Rle(gff[,7]))
>
> seqs <- getSeq(Rnorvegicus, flank(gr, 200))
> names(seqs) <- nms
>
> It's much lighter on it's feet than a loop and a nice intro to the GenomicRanges package for me.
>
> As a follow-up question I'm going to write out the seqs object as fasta and use it in clover for TFBS analysis.
>
>
> I was wondering whether the strand is taken into account when I get the flanking sequence i.e. is the sequence returned from the + or - strand as defined in the GRanges object?
A call to `flank` on a GRanges object is "strand aware" -- is that
what you're asking?
Consider:
R> gr <- GRanges('chr1', IRanges(c(20, 20), width=5), c('+', '-'))
R> flank(gr, 10)
GRanges with 2 ranges and 0 elementMetadata values:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr1 [10, 19] +
[2] chr1 [25, 34] -
R> flank(gr, 10, start=FALSE)
GRanges with 2 ranges and 0 elementMetadata values:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr1 [25, 34] +
[2] chr1 [10, 19] -
getSeq is also strand aware -- it will return the reverse complement
of the sequence in your bounds if its on the `-` strand.
HTH,
-steve
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
More information about the Bioconductor
mailing list