[Bioc-devel] idempotent identifier mapping with GSEABase::mapIdentifiers()

Martin Morgan mtmorgan at fhcrc.org
Mon Feb 27 14:50:05 CET 2012


On 02/27/2012 04:45 AM, Vincent Carey wrote:
> I have run into a very similar situation.  Ultimately a uniformization of
> the annotation API will be called for.
> I wonder if a global short-term fixup would get you through this situation?
>
>> org.Hs.egENTREZID = new.env(hash=TRUE)
>> k = mappedkeys(org.Hs.egENSEMBL)  # or any other good source of all keys
>> for (i in 1:length(k)) assign(k[i], k[i], org.Hs.egENTREZID)
>> get("1000", org.Hs.egENTREZID)
> [1] "1000"
>
>
> On Mon, Feb 27, 2012 at 6:25 AM, Robert Castelo<robert.castelo at upf.edu>wrote:
>
>> hi,
>>
>> i collaborate mantaining the packages GSVA and GSVAdata and i have a
>> question about the function mapIdentifiers() from the GSEABase package
>> which i'm going to illustrate through an example.
>>
>>
>> 1. let's build first an ExpressionSet object whose annotation slot is
>> going to point to the human organism-level annotation pacakge
>> org.Hs.eg.db:
>>
>> library(Biobase)
>> library(org.Hs.eg.db)
>>
>> mapped_genes<- mappedkeys(org.Hs.egSYMBOL)
>>
>> exp<- matrix(rnorm(1000), nrow=100,
>>               dimnames=list(mapped_genes[1:100],
>>                             paste("sample", 1:10, sep="")))
>> eset<- new("ExpressionSet", exprs=exp, annotation="org.Hs.eg.db")
>> ExpressionSet (storageMode: lockedEnvironment)
>> assayData: 100 features, 10 samples
>>   element names: exprs
>> protocolData: none
>> phenoData: none
>> featureData: none
>> experimentData: use 'experimentData(object)'
>> Annotation: org.Hs.eg.db
>>
>> 2. now i'm going to load the Broad gene sets stored as a
>> GeneSetCollection object in the experimental data package GSVAdata:
>>
>> library(GSVAdata)
>> data(c2BroadSets)
>> c2BroadSets
>> GeneSetCollection
>>   names: NAKAMURA_CANCER_MICROENVIRONMENT_UP,
>> NAKAMURA_CANCER_MICROENVIRONMENT_DN, ...,
>> ST_PHOSPHOINOSITIDE_3_KINASE_PATHWAY (3272 total)
>>   unique identifiers: 5167, 100288400, ..., 57191 (29340 total)
>>   types in collection:
>>     geneIdType: EntrezIdentifier (1 total)
>>     collectionType: BroadCollection (1 total)
>>
>>
>> 3. finally, i'd like to obtain a new GeneSetCollection object whose
>> identifiers have been mapped between the two classes of identifiers in
>> the GeneSetCollection and the ExpressionSet objects.
>>
>> in this case both objects actually work with the same class of
>> identifiers (Entrez), so in fact i don't need to do that but this
>> operation forms part of a piece of code in the package GSVA which i'd
>> like it to work regardless of the kind of annotation package referred to
>> in the ExpressionSet object. i had expected that the function
>> mapIdentifiers() would have some kind of idempotent behavior, but i get
>> the following error:
>>
>> gsc<- mapIdentifiers(c2BroadSets,
>>                       AnnotationIdentifier(annotation(eset)))
>> Error in GeneSetCollection(lapply(what, mapIdentifiers, to, ..., verbose
>> = verbose)) :
>>   error in evaluating the argument 'object' in selecting a method for
>> function 'GeneSetCollection': Error in get(mapName, envir = pkgEnv,
>> inherits = FALSE) :
>>   object 'org.Hs.egENTREZID' not found
>>
>>
>> which does not occur if the feature names and annotation of the
>> ExpressionSet corresponds to a classical affy chip (e.g. "hgu95av2").

The issue seems to be in GSEABase:::.mapIdentifiers_selectMaps where org 
packages are handled specially, but apparently not in a general enough 
way; I'll look in to this. Martin

>>
>> i built the object c2BroadSets in the experiment data package GSVAdata
>> by importing the entire xml file from the Broad sets so, i guess it
>> could be also possible that i did something wrong when i built this
>> 'c2BroadSets' object and there's no problem, bug or lacking feature in
>> mapIdentifiers().
>>
>> i look forward to your diagnostic and suggestions in any of these
>> possible directions.
>>
>>
>> thanks,
>> robert.
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioc-devel mailing list