P-value correction when evaluating correlation between gene and miRNA expression

First of all I apologize without the question is very basic, I am taking my first steps in bioinformatics.

Data information

We are evaluating the correlation (using the Pearson, Kendall or Spearman method) between gene expression and miRNA expression using the corAndPvalue function of WCGNA.

The resulting structure would be a DataFrame containing all combinations between each gene with each miRNAs, containing the following columns:

Gene     miRNA      Correlation P-value
Gen_1    miRNA_1    0,959       0.00311
Gen_1    miRNA_2    -0,039      0.1041
Gen_1    miRNA_3    -0,344      0.0021
Gen_2    miRNA_1    0,1333      0.00451
Gen_2    miRNA_2    0,877       0.07311


Considering the huge number of correlation tests we are going to evaluate, we need to adjust the p-values to avoid correlations due to chance. Bonferroni does not seem to be the best solution, so we would use Benjamini-Hochberg method (BH). The question is:

The BH correction for the Gen_1 x miRNA_1 combination, should consider the p-values of all combinations that include Gen_1 (Option 1), or should consider all the p-values of all the genes x miRNA combinations (Option 2)?

For example, let’s assume an expression dataset of 20,000 genes and another of 15,000 miRNAs

Option 1:

To adjust Gen_1 x miRNA_1 we would use 15,000 p-values (Gen_1 x miRNA_1, Gen_1 x miRNA_2, …, Gen_1 x miRNA_15000).

Option 2:

To adjust Gen_1 x miRNA_1 we would use 300,000,000 p-values (Gen_1 x miRNA_1, Gen_1 x miRNA_2, …, Gen_1 x miRNA_15000, Gen_2 x miRNA_1, Gen_2 x miRNA_2, …, Gen_2 x miRNA_15000 and so on).

Suplementary question

Documentation of the method fdrcorrection from Python Statsmodels library suggests that for negative correlations (that could be frequent in a mRNA x miRNA correlation analysis) Benjamini-Yekutieli would work better; is that right? Or Benjamini-Hochberg method would be appropiated for this case?

Any kind of help would be much appreciated, thanks in advance!

Bioinformatics Asked on November 15, 2021

1 Answers

One Answer

I made the same question in CrossValidated forum and got an excellent answer!

The important part:

You need to correct for all of the comparisons you are doing. So if that's 300,000,000 comparisons you need to correct for that many multiple comparisons.

For more information check the answer in the link above

Answered by Genarito on November 15, 2021

Add your own answers!

Related Questions

Viral Metagenomics

1  Asked on November 1, 2020 by l-r-joshi


Issues with AutoDock Vina

0  Asked on October 18, 2020 by ibio_rep1


Swapping to effect increasing allele in case/control studies

0  Asked on October 9, 2020 by dale-handley


Error When Using biocLite as an installer in rpy2 python library

1  Asked on September 27, 2020 by abiologist


Seurat DE t.test

1  Asked on August 11, 2020 by vdu12345


Number of reactions per metabolic pathway

0  Asked on August 11, 2020 by mmphysics


Find all the bases for given reference position

0  Asked on August 8, 2020 by diesel__100


Ask a Question

Get help from others!

© 2021 All rights reserved.