[BioC] frmaTools: error with 'convertPlatform'
Matthew McCall
mccallm at gmail.com
Sat Jun 2 05:25:57 CEST 2012
Guido,
Well I've found the problem, but I'm not sure exactly what the
solution is. The issue is that multiple probes on 1.0 are mapping to
the same probe on 1.1:
> sum(duplicated(map[,1]))
[1] 0
> sum(duplicated(map[,2]))
[1] 1749
I think this may be a feature of the alternative CDF, but I'm not
positive (perhaps someone else can weigh in on other this is the
case). But that is what is "breaking" the platform conversion.
Sorry I couldn't be of more help.
Best,
Matt
On Fri, Jun 1, 2012 at 6:04 PM, Hooiveld, Guido <Guido.Hooiveld at wur.nl> wrote:
> Hi,
> I uploaded it here:
> https://sendit.wur.nl/Download.aspx?id=cb769829-a7e5-4f7f-9311-290df518ce5d
>
> Guido
>
> -----Original Message-----
> From: Matthew McCall [mailto:mccallm at gmail.com]
> Sent: Friday, June 01, 2012 22:04
> To: Hooiveld, Guido
> Cc: bioconductor (bioconductor at stat.math.ethz.ch)
> Subject: Re: frmaTools: error with 'convertPlatform'
>
> Guido,
>
> Thanks for the line by line results. Can you send me the map object -- the result of: map <- makeMaps(new.platform, old.platform)?
>
> Best,
> Matt
>
> On Fri, Jun 1, 2012 at 3:53 PM, Hooiveld, Guido <Guido.Hooiveld at wur.nl> wrote:
>> Hi Matt,
>> Thanks for coming back on this.
>>
>> First of all I am fully aware that I am not using the preferred analysis route for Gene ST arrays (which indeed should go through e.g. oligo or XPS). But the possibilities of your function convertPlatform are so nice I gave it a try with these arrays using the remapped CDFs (which AFAIK are valid CDFs; that is they confirm to all standards).
>>
>> I decided to look at the source code of convertPlatform to manually
>> execute it step-by-step (since the code is not so long), and check the
>> output of each line. By doing so I indeed identified the line were
>> things go wrong. It is happening at the 2nd last line of
>> convertPlatform (i.e. exprs2[index,] <- exprs(object)[pmIndex,])
>>
>>
>> # 1st rename object according to 'nomenclature' used when function
>> convertPlatform is defined # convertPlatform <- function(object, new.platform){........
>>
>>> object <- affy.data
>>> new.platform <- "mogene10stv1mmentrezg"
>>> cleancdfname(cdfName(object))
>> [1] "mogene11stv1mmentrezgcdf"
>>> cdfname <- cleancdfname(cdfName(object)) old.platform <-
>>> gsub("cdf","",cdfname) old.platform
>> [1] "mogene11stv1mmentrezg"
>>> map <- makeMaps(new.platform, old.platform)
>>> head(map)
>> mogene10stv1mmentrezg mogene11stv1mmentrezg [1,]
>> 831891 213206 [2,] 237305
>> 15731 [3,] 14720 511115 [4,]
>> 615715 549916 [5,] 362313
>> 1064843 [6,] 1080675 271008
>>> tmp <- new("AffyBatch", cdfName=new.platform) tmp
>> AffyBatch object
>> size of arrays=0x0 features (15 kb)
>> cdf=mogene10stv1mmentrezg (21225 affyids) number of samples=0 number
>> of genes=21225 annotation=
>>> pns <- probeNames(tmp)
>>> head(pns)
>> [1] "100008567_at" "100008567_at" "100008567_at" "100008567_at" "100008567_at"
>> [6] "100008567_at"
>>
>> # check whether this identical output also occurs when 'real'
>> Affybatch object (i.e. affy.data) is used as input
>>> head(probeNames(affy.data))
>> [1] "100008567_at" "100008567_at" "100008567_at" "100008567_at" "100008567_at"
>> [6] "100008567_at"
>> # yes, same output
>>
>>> index <- unlist(pmindex(tmp))
>>> head(index)
>> 100008567_at1 100008567_at2 100008567_at3 100008567_at4 100008567_at5
>> 831891 237305 14720 615715 362313
>> 100008567_at6
>> 1080675
>>> mIndex <- match(index,map[,1])
>>> head(mIndex)
>> [1] 1 2 3 4 5 6
>>> pmIndex <- map[mIndex,2]
>>> head(pmIndex)
>> [1] 213206 15731 511115 549916 1064843 271008
>>> paste(new.platform,"cdf",sep="")
>> [1] "mogene10stv1mmentrezgcdf"
>>> env <- get(paste(new.platform,"dim",sep=""))
>>
>> # check which environment is defined
>>> paste(new.platform,"dim",sep="")
>> [1] "mogene10stv1mmentrezgdim"
>> #
>>
>>> nc <- env$NCOL
>>> head(nc)
>> [1] 1050
>>> nr <- env$NROW
>>> head(nr)
>> [1] 1050
>>> exprs2 <- matrix(nrow=nc*nr, ncol=length(object))
>>> dim(exprs2)
>> [1] 1102500 23
>> # Note, nr and nc are indeed the dimension of the v1.0 (cartridge) array, as is the number of probes. See my first email.
>>
>>> exprs2[index,] <- exprs(object)[pmIndex,]
>> Error: subscript out of bounds
>>>
>> ^^^ here it goes wrong. I *think* this is related to the fact that the v1.1 array (GeneTitan) is rectangular...
>> Compare dimensions of newly created expression v1.0 matrix:
>>> dim(exprs2)
>> [1] 1102500 23
>> With that of the input v1.1 expression matrix:
>>> dim(exprs(object))
>> [1] 1178100 23
>>>
>> Number of arrays match, but number of probes not...
>>
>> To me it naively looks some probes of the v1.1 array have to be deleted that do not match cq are not present on the v1.0 array...??
>>
>> Thanks again for looking into this,
>> Guido
>>
>> BTW: if needed I can send you some CEL files from both platforms.
>>
>> -----Original Message-----
>> From: Matthew McCall [mailto:mccallm at gmail.com]
>> Sent: Friday, June 01, 2012 18:19
>> To: Hooiveld, Guido
>> Cc: bioconductor (bioconductor at stat.math.ethz.ch)
>> Subject: Re: frmaTools: error with 'convertPlatform'
>>
>> Guido,
>>
>> The frma and frmaTools packages use oligo (rather than AffyBatch) objects for the ST arrays, so what you're trying to do is a bit outside the intended functionality. I would also caution you against combining data from different platforms as probe behavior can change quite a bit.
>>
>> That said, we can see whether there's some simple modification that could let you try out what you'd like. Can you figure out at what point in the convertPlatform function the error pops up?
>>
>> Best,
>> Matt
>>
>>
>>
>> On Fri, Jun 1, 2012 at 8:20 AM, Hooiveld, Guido <Guido.Hooiveld at wur.nl> wrote:
>>> Hi,
>>>
>>> I would like to use the function 'convertPlatform' (from the library
>>> frmaTools) to convert an Affybatch object from the MoGene ST v1.1
>>> (GeneTitan
>>> array) format into that of the MoGene ST v1.0 format (cartridge
>>> array), but I run into an error. The reason that I would like to
>>> convert that Affybatch object is that I would like to combine 2
>>> experiments performed on those 2 platform so I can normalize them together.
>>>
>>>
>>>
>>> In principle the content of the arrays is the same, that is the
>>> probeSETS should be identical, but the design and number of probes
>>> are
>>> different: the
>>> v1.0 array (cartridge) is square (1050cols x 1050rows) whereas the
>>> v1.1 array is rectangular (990cols x 1190rows). I think this may be
>>> related to the error I experience. Note also that I would like to use a remapped CDF.
>>>
>>>
>>>
>>> Any suggestions?
>>>
>>> Thanks,
>>>
>>> Guido
>>>
>>>
>>>
>>>
>>>
>>>> affy.data <- ReadAffy(cdfname="mogene11stv1mmentrezg")
>>>
>>>> affy.data
>>>
>>> Loading required package: AnnotationDbi
>>>
>>>
>>>
>>> AffyBatch object
>>>
>>> size of arrays=1190x990 features (25 kb)
>>>
>>> cdf=mogene11stv1mmentrezg (21225 affyids)
>>>
>>> number of samples=23
>>>
>>> number of genes=21225
>>>
>>> annotation=mogene11stv1mmentrezg
>>>
>>> notes=
>>>
>>>> object.conv <- convertPlatform(affy.data, "mogene10stv1mmentrezg")
>>>
>>> Loading required package: mogene10stv1mmentrezgprobe
>>>
>>> Loading required package: mogene11stv1mmentrezgprobe
>>>
>>>
>>>
>>>
>>>
>>> Attaching package: 'mogene10stv1mmentrezgcdf'
>>>
>>>
>>>
>>> The following object(s) are masked from 'package:mogene11stv1mmentrezgcdf':
>>>
>>>
>>>
>>> i2xy, xy2i
>>>
>>>
>>>
>>> Error in convertPlatform(affy.data, "mogene10stv1mmentrezg") :
>>>
>>> subscript out of bounds
>>>
>>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Some maybe relevant array characteristics:
>>>
>>>> library(affxparser)
>>>
>>>> GeneSTv1.0 <- readCelHeader("MouseTP_Brain_01_mGENE.CEL")
>>>
>>>> GeneSTv1.0
>>>
>>> $filename
>>>
>>> [1] "./MouseTP_Brain_01_mGENE.CEL"
>>>
>>>
>>>
>>> $version
>>>
>>> [1] 1
>>>
>>>
>>>
>>> $cols
>>>
>>> [1] 1050
>>>
>>>
>>>
>>> $rows
>>>
>>> [1] 1050
>>>
>>>
>>>
>>> $total
>>>
>>> [1] 1102500
>>>
>>> <<SNIP>>
>>>
>>>
>>>
>>>> GeneSTv1.1 <- readCelHeader("MouseBrain_1.CEL")
>>>
>>>> GeneSTv1.1
>>>
>>> $filename
>>>
>>> [1] "./MouseBrain_1.CEL"
>>>
>>>
>>>
>>> $version
>>>
>>> [1] 1
>>>
>>>
>>>
>>> $cols
>>>
>>> [1] 990
>>>
>>>
>>>
>>> $rows
>>>
>>> [1] 1190
>>>
>>>
>>>
>>> $total
>>>
>>> [1] 1178100
>>>
>>> <<SNIP>>
>>>
>>>
>>>
>>>> sessionInfo()
>>>
>>> R version 2.15.0 (2012-03-30)
>>>
>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>
>>>
>>>
>>> locale:
>>>
>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>>>
>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>>>
>>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>>>
>>> [7] LC_PAPER=C LC_NAME=C
>>>
>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>>
>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>>
>>>
>>> attached base packages:
>>>
>>> [1] stats graphics grDevices utils datasets methods base
>>>
>>>
>>>
>>> other attached packages:
>>>
>>> [1] frmaTools_1.8.0 affy_1.34.0 Biobase_2.16.0
>>> BiocGenerics_0.2.0
>>>
>>>
>>>
>>> loaded via a namespace (and not attached):
>>>
>>> [1] affyio_1.24.0 BiocInstaller_1.4.4 DBI_0.2-5
>>>
>>> [4] preprocessCore_1.18.0 zlibbioc_1.2.0
>>>
>>>
>>>
>>>
>>>
>>> ---------------------------------------------------------
>>>
>>> Guido Hooiveld, PhD
>>>
>>> Nutrition, Metabolism & Genomics Group
>>>
>>> Division of Human Nutrition
>>>
>>> Wageningen University
>>>
>>> Biotechnion, Bomenweg 2
>>>
>>> NL-6703 HD Wageningen
>>>
>>> the Netherlands
>>>
>>> tel: (+)31 317 485788
>>>
>>> fax: (+)31 317 483342
>>>
>>> email: guido.hooiveld at wur.nl
>>>
>>> internet: http://nutrigene.4t.com
>>>
>>> http://scholar.google.com/citations?user=qFHaMnoAAAAJ
>>>
>>> http://www.researcherid.com/rid/F-4912-2010
>>>
>>>
>>
>>
>>
>> --
>> Matthew N McCall, PhD
>> 112 Arvine Heights
>> Rochester, NY 14611
>> Cell: 202-222-5880
>>
>>
>>
>>
>
>
>
> --
> Matthew N McCall, PhD
> 112 Arvine Heights
> Rochester, NY 14611
> Cell: 202-222-5880
>
>
>
>
--
Matthew N McCall, PhD
112 Arvine Heights
Rochester, NY 14611
Cell: 202-222-5880
More information about the Bioconductor
mailing list