[BioC] help with PubMed Central OAI
Chris Stubben
stubben at lanl.gov
Fri Apr 20 19:33:56 CEST 2012
I've been using Efetch to get some full text articles from Pubmed
Central, which works fine...
url <-
"http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pmc&id=PMC2784878"
x<-readLines(url)
doc <- xmlParse(x ) # requires XML package
xpathSApply(doc, "//abstract", xmlValue)
[1] "The majority of all genes have so far been identified and annotated
systematically through in silico gene finding. Here we report the
finding of 3662 strand-specific transcriptionally active regions (TARs)
in the genome of Bacillus subtilis by the use of tiling arrays.
I recently noticed the PMC copyright says to use the FTP or OAI service
for any "automated" retrievals, so I thought I would try OAI, but I
can't get the same xpath queries to work.
url <-
"http://www.pubmedcentral.nih.gov/oai/oai.cgi?verb=GetRecord&metadataPrefix=pmc&identifier=oai:pubmedcentral.nih.gov:2784878"
x2<-readLines(url) # will warn about incomplete final line
doc2 <- xmlParse(x2 )
xpathSApply(doc2, "//abstract", xmlValue)
list()
This query does work so I know there's an abstract tag.
table(xpathSApply(doc2, "//*", xmlName))
abstract ack
addr-line aff article
article-categories
1 1
1 1 1 1
article-id article-meta
article-title author-notes
back body
3 1
79 1 1 1
caption contrib contrib-group
copyright-statement corresp date
7 3
1 1 1 1
Thanks for any help.
Chris Stubben
More information about the Bioconductor
mailing list