[BioC] Question about hgu133plus2cdf?
Nicolas Delhomme
delhomme at embl.de
Thu Mar 15 15:11:41 CET 2012
Well, I would not necessarily agree with that. The custom CDF removes a lot of information from the original CDF: ~30% of the probes. So it's better to have one package in R where you can get the complete original data as provided by the manufacturer, don't you agree? This way you can manipulate it the way you want, i.e. you might not find appropriate for your purpose the way that Dai et al. create their packages.
Cheers,
Nico
---------------------------------------------------------------
Nicolas Delhomme
Genome Biology Computational Support
European Molecular Biology Laboratory
Tel: +49 6221 387 8310
Email: nicolas.delhomme at embl.de
Meyerhofstrasse 1 - Postfach 10.2209
69102 Heidelberg, Germany
---------------------------------------------------------------
On 15 Mar 2012, at 14:52, Fabrice Tourre wrote:
> Dear Nico,
>
> Thank you very much for your explain.
>
> I am wondering why the hgu133plus2cdf in Bioc is not based on the
> custom CDF from Dai et al. It seems that unique mapping is better.
>
> On Thu, Mar 15, 2012 at 9:16 PM, Nicolas Delhomme <delhomme at embl.de> wrote:
>> Dear Fabrice,
>>
>> The hgu133plus2cdf in Bioc is based on the information provided by Affymetrix.
>>
>> The custom CDF from the website you mention, contains probes re-aligned to the human genome and only those probes that have a unique mapping are used. See their publication: Dai et al. Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic Acids Research (2005) vol. 33 (20) pp. e175 .
>>
>> That won't solve your SNP problem, but you can use the hgu133plus2probes package that contains the probe sequences or the one provided by Dai et al for that. Based on these sequences and their mapping, you should be able to filter out those that contains SNPs you're not interested in. For that the IRanges functionalities might prove helpful. Whether you drop the whole probe-set or try to re-create your own CDF then is up to you.
>>
>> If you want to create your own CDF, check the vignette of the makecdfenv package for that: vignette("makecdfenv"). And you might want to make sure your new probe-set are valid. This paper is a good starting point for that: Lu et al. Transcript-based redefinition of grouped oligonucleotide probe sets using AceView: high-resolution annotation for microarrays. BMC Bioinformatics (2007) vol. 8 pp. 108.
>>
>> HTH,
>>
>> Nico
>>
>>
>> ---------------------------------------------------------------
>> Nicolas Delhomme
>>
>> Genome Biology Computational Support
>>
>> European Molecular Biology Laboratory
>>
>> Tel: +49 6221 387 8310
>> Email: nicolas.delhomme at embl.de
>> Meyerhofstrasse 1 - Postfach 10.2209
>> 69102 Heidelberg, Germany
>> ---------------------------------------------------------------
>>
>>
>>
>>
>>
>> On 15 Mar 2012, at 13:44, Fabrice Tourre wrote:
>>
>>> Dear list,
>>>
>>> I am now analysis hgu133plus2 array. I want a CDF which has been
>>> removed probes with SNPs. Because I want to remove the the noise
>>> caused by single nucleotide polymorphisms (SNPs) in different samples.
>>> Also I do not want some probeset which sequences can mapped to
>>> multiple genome position.
>>>
>>> In bioconductor, there is a package hgu133plus2cdf. I also noticed
>>> there is a website provide custom CDF file for hgu133plus2.
>>>
>>> The website is:
>>> http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/CDF_download.asp
>>> HGU133Plus2 (Version 15.0.0, ENTREZG)
>>>
>>> Is the same for this two CDF files?
>>>
>>> Or the package hgu133plus2cdf directly from Affy CDF file?
>>>
>>> Thank you very much in advance.
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
More information about the Bioconductor
mailing list