[BioC] help with multiple testing
Wolfgang Huber
whuber at embl.de
Mon Jun 25 20:10:58 CEST 2012
Dear Mike
I'd be surprised if this problem were cracked by a brute force purely
'statistical' approach. You could try to reduce the number of tests by
first grouping the genes into 'pathways' or functional modules. With a
lot of luck, the data may then just be large enough.
Besy wishes
Wolfgang
Jun/25/12 1:15 PM, efthimiosm scripsit::
> Hi all,
>
> My name is Mike and I am a post-doctoral fellow in Bioinformatics. I
> have a question regarding multiple testing p-values adjustment and I
> wonder if someone could give me a piece of advice.
>
> I have multiple gene pairs (approximately 8,256) composed by all
> possible combinations of 129 genes. For each pair A-B (A different from
> B) four values are recorded: number of tumors found in both A and B
> (TT), number of tumors only in A (TF), number of tumors only in B (FT),
> number of tumors found neither in A nor in B (FF). The data are in the
> form of 2x2 contingency tables. E.g.
>
> Gene 1 Gene 2 TT TF FT FF
> g1 g2 5 1 1 27
> g1 g3 4 1 1 28
> g2 g3 4 2 0 28
> ...
> ...
> ...
>
> Notice that each gene is paired with all others and thus it is
> represented 128 times in this list. I want to find which of the 8,256
> gene pairs (tests) show significant associations between rows (in A, not
> in A) and columns (in B, not in B) by Fisher or Barnard test.
> Subsequently I have to perform p-value adjustment for multiple testing.
>
> At 5% I find approximately 500 significant gene pairs but, naturally,
> all p-value adjustment procedures I tried (for independent tests: BH,
> q-value; for dependent tests: BY, adaptiveBH and BlaRoq from package
> "multtest") produce adj. p-values > 0.3. I think that the problem is
> that the highly dependent nature of the data (50% of the genes have very
> small number of mutations which gives high p-values for all pair they
> generate) affects dramatically the adjustment procedure.
>
> Is there a better way (method) to run the p-values adjustment?
>
> Do you think if I created multiple lists of gene pairs, where each gene
> is represented only once, and then estimate q-value (multiple q-values
> for each pair) would be an appropriate solution?
>
>
> Thank you,
> Mike
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Best wishes
Wolfgang
Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber
More information about the Bioconductor
mailing list