[BioC] help with multiple testing
efthimiosm
efthimiosm at bii.a-star.edu.sg
Mon Jun 25 13:15:23 CEST 2012
Hi all,
My name is Mike and I am a post-doctoral fellow in Bioinformatics. I
have a question regarding multiple testing p-values adjustment and I
wonder if someone could give me a piece of advice.
I have multiple gene pairs (approximately 8,256) composed by all
possible combinations of 129 genes. For each pair A-B (A different from
B) four values are recorded: number of tumors found in both A and B
(TT), number of tumors only in A (TF), number of tumors only in B (FT),
number of tumors found neither in A nor in B (FF). The data are in the
form of 2x2 contingency tables. E.g.
Gene 1 Gene 2 TT TF FT FF
g1 g2 5 1 1 27
g1 g3 4 1 1 28
g2 g3 4 2 0 28
...
...
...
Notice that each gene is paired with all others and thus it is
represented 128 times in this list. I want to find which of the 8,256
gene pairs (tests) show significant associations between rows (in A, not
in A) and columns (in B, not in B) by Fisher or Barnard test.
Subsequently I have to perform p-value adjustment for multiple testing.
At 5% I find approximately 500 significant gene pairs but, naturally,
all p-value adjustment procedures I tried (for independent tests: BH,
q-value; for dependent tests: BY, adaptiveBH and BlaRoq from package
"multtest") produce adj. p-values > 0.3. I think that the problem is
that the highly dependent nature of the data (50% of the genes have very
small number of mutations which gives high p-values for all pair they
generate) affects dramatically the adjustment procedure.
Is there a better way (method) to run the p-values adjustment?
Do you think if I created multiple lists of gene pairs, where each gene
is represented only once, and then estimate q-value (multiple q-values
for each pair) would be an appropriate solution?
Thank you,
Mike
More information about the Bioconductor
mailing list