[BioC] GEOquery and parsing SOFT files
    Wacek Kusnierczyk 
    Waclaw.Marcin.Kusnierczyk at idi.ntnu.no
       
    Mon May 25 16:13:21 CEST 2009
    
    
  
Hello,
The getGEO function from GEOquery parses GEO soft files.  With a
particular GSE file (GSE13638), it took over 15 minutes on my
not-so-crappy machine to parse the file (a local file, download time
excluded).  I've written a simple parser in perl, and parsing the same
file and storing the data in a nested hash/array structure takes ca. 2
seconds.  I'm pretty sure there is more essential processing done by
getGEO to organize the data into a GSE object, but still, there seems to
be an incredibly inefficient implementation underneath.
I haven't looked at the source code yet, but here's a question:  what is
the likely reason getGEO is so slow?  Is it the parsing itself, or
rather wraping the data into the appropriate structure?  Where should I
start to look for code to be improved?
vQ
    
    
More information about the Bioconductor
mailing list