[BioC] Newbie methylation and stats question

Mark Robinson mark.robinson at imls.uzh.ch
Tue Jun 19 16:17:23 CEST 2012


Hi Gustavo,

I've inserted a few "reactions" below.

On 19.06.2012, at 12:57, Gustavo Fernández Bayón wrote:

> Hi everybody.
> 
> As a newbie to bioinformatics, it is not uncommon to find difficulties in the way biological knowledge mixes with statistics. I come from the Machine Learning field, and usually have problems with the naming conventions (well, among several other things, I must admit). Besides, I am not an expert in statistics, having used the barely necessary for the validation of my work.
> 
> Well, let's try to be more precise. One of the topics I am working more right now is the analysis of methylation array data. As you surely now, the final processed (and normalized) beta values are presented in a pxn matrix, where there are p different probes and n different samples or individuals from which we have obtained the beta-values. I am not currently working with the raw data.
> 
> Imagine, for a moment, that we have identified two regions of probes, A and B, with a group of nA probes belonging to A, another group (of nB probes) that belongs to B, and the intersection is empty. Say that we want to find a way to show there is a statistically significant difference between the methylation values of both regions. 
> As far as I have seen in the literature, comparisons (statistical tests) are always done comparing the same probe values between case and control groups of individuals or samples. For example, when we are trying to find differentiated probes.

You can do differential analyses at the probe level or a regional level.  An example of the latter (perhaps less popular or less established or less known) is:
http://ije.oxfordjournals.org/content/41/1/200.abstract


> However, if I think of directly comparing all the beta values from region A (nA * n values) against the ones in region B (nB * n values) with a, say, t test, I get the suspicion that something is not being done the way it should. My knowledge of Biology and Statistics is still limited and I cannot explain why, but I have the feeling that there is something formally wrong in this approximation. Am I right? 

First of all, I feel this is an unusual comparison to make.  Presumably, region A and region B are different regions of the genome - what does it mean if methylation levels in region A and B are different? Maybe you could expand on the biological question here?  

Second, if this is the comparison you really want to make, what role do your n samples play here?  Do you have cases and controls?  It may be sensible to fit a model to allow you to decompose effects of case/control from those of interest (A/B).  But again, this needs to be geared to your biological question, which I don't yet understand.

Best,
Mark

> What I have done in similar experiments has been to find differentiated probes, and then do a test to the proportion of differentiated probes to total number of them, so I could assign a p-value to prove that there was a significant influence of the region of reference. 

> Several questions here: which could be a coherent approximation to the regions A and B problem stated above? Is there any problem with methylation data I am not aware of which makes only the in-probe analysis valid? Any bibliographic references that could help me seeing the subtleties around?
> 
> As you can see, concepts are quite interleaved in my mind, so any help would be very appreciated.
> Regards,
> Gustavo
> 
> 
> 
> 
> ---------------------------
> Enviado con Sparrow (http://www.sparrowmailapp.com/?sig)
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

----------
Prof. Dr. Mark Robinson
Bioinformatics
Institute of Molecular Life Sciences
University of Zurich
Winterthurerstrasse 190
8057 Zurich
Switzerland

v: +41 44 635 4848
f: +41 44 635 6898
e: mark.robinson at imls.uzh.ch
o: Y11-J-16
w: http://tiny.cc/mrobin

----------
http://www.fgcz.ch/Bioconductor2012



More information about the Bioconductor mailing list