[Bioc-sig-seq] BLAST from within R
Martin Morgan
mtmorgan at fhcrc.org
Thu Sep 30 21:25:25 CEST 2010
On 09/30/2010 10:53 AM, Ivan Gregoretti wrote:
> Hello everybody,
>
> How do you perform a BLAST alignment of 1 sequence within R?
> Any pointer would be appreciated.
>
> The sequence in question is about 40kb and the pertinent genome is mm9.
Hi Ivan
One possibility comes from
http://www.ncbi.nlm.nih.gov/BLAST/Doc/urlapi.html
where one could load XML
library(XML)
then formulate a query
baseUrl <- "http://www.ncbi.nlm.nih.gov/blast/Blast.cgi"
query <- paste("QUERY=555&DATABASE=nr&HITLIST_SIZE=10&FILTER=L",
"&EXPECT=10&PROGRAM=blastn", sep="")
url0 <- sprintf("%s?%s&CMD=Put", baseUrl, query)
Then submit the query and extract the return result id (RID) using XML
and XPATH queries (see http://www.w3.org/TR/xpath/, especially section 2.5)
post <- htmlTreeParse(url0, useInternalNodes=TRUE)
rid <- xpathApply(post, '//input[@name="RID"][@id="rid"]',
xmlAttrs)[[1]][["value"]]
and then finally retrieve the result
url1 <- sprintf("%s?RID=%s&FORMAT_TYPE=XML&CMD=Get", baseUrl, rid)
result <- xmlTreeParse(url1, useInternalNodes=TRUE)
what to do now depends on the info you want out, e.g.,
qseq <- xpathApply(result, "//Hsp_qseq", xmlValue)
hseq <- xpathApply(result, "//Hsp_hseq", xmlValue)
midline <- xpathApply(result, "//Hsp_midline", xmlValue)
for (i in 1:5)
cat(qseq[[i]], hseq[[i]], midline[[i]], sep="\n")
Might be fun (though a bit of work, to cast the alignments onto common
reference coordinates) to use this in the new
Biostrings::DNAMultipleAlignment constructor.
Martin
>
> Thank you,
>
> Ivan
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
More information about the Bioc-sig-sequencing
mailing list