[BioC] GEOquery, GSEMatrix parameter and lifecycle of GEO series data
Gustavo Fernández Bayón
gbayon at gmail.com
Wed Jun 27 10:51:53 CEST 2012
Hi everybody.
I am experiencing quite a few problems while trying to download and parse a dataset of methylation values. These are not technical problems, IMHO. GEOquery works perfectly, and it really makes getting this kind of data an easy task. However, I think I do not understand exactly the lifecycle of GEO series data, and I would like to ask in this list for any hint on this behavior, so I could try to fix it.
What I first did was to download and parse the desired GSE data file, with the default value of GSMMatrix parameter (TRUE). Besides, I extracted the ExpressionSet and the assayData I was looking for.
my.gse <- getGEO('GSE30870', destdir='/Users/gbayon/Documents/GEO/')
my.expr.set <- my.gse[[1]]
beta.values <- exprs(my.expr.set)
What really gave me a surprise at first, was to see many strange values (all containing the 'NA' string) in the featureNames of the expression set.
>head(featureNames(es), n=20)
[1] "NA" "cg00000108" "cg00000109" "cg00000165" "NA.1" "NA.2" "NA.3"
[8] "NA.4" "cg00000363" "NA.5" "NA.6" "NA.7" "NA.8" "cg00000734"
[15] "NA.9" "cg00000807" "cg00000884" "NA.10" "NA.11" "NA.12"
If I select an individual GSM in the series, and download it, the featureNames are ok. If I try to download the GSE with GSEMatrix=FALSE, I get a list of GSM data sets, and the results is again good. This made me suspect of the intermediate, pre-parsed, matrix form. I haven't found a clue about the lifecycle of this kind of data. I mean, how the matrix is built. Is it a manual process? Is it automatic?
If it is a manual process, then I guess I will have to contact the responsible of uploading the data to see if they can fix it. But, if it is not, I would like to know if this is something relating to BioC or, more plausibly, to GEO.
Any help would be appreciated.
Regards,
Gustavo
---------------------------
Enviado con Sparrow (http://www.sparrowmailapp.com/?sig)
More information about the Bioconductor
mailing list