# P-value correction when evaluating correlation between gene and miRNA expression

First of all I apologize without the question is very basic, I am taking my first steps in bioinformatics.

## Data information

We are evaluating the correlation (using the Pearson, Kendall or Spearman method) between gene expression and miRNA expression using the corAndPvalue function of WCGNA.

The resulting structure would be a DataFrame containing all combinations between each gene with each miRNAs, containing the following columns:

Gene     miRNA      Correlation P-value
Gen_1    miRNA_1    0,959       0.00311
Gen_1    miRNA_2    -0,039      0.1041
Gen_1    miRNA_3    -0,344      0.0021
Gen_2    miRNA_1    0,1333      0.00451
Gen_2    miRNA_2    0,877       0.07311
...


## Question

Considering the huge number of correlation tests we are going to evaluate, we need to adjust the p-values to avoid correlations due to chance. Bonferroni does not seem to be the best solution, so we would use Benjamini-Hochberg method (BH). The question is:

The BH correction for the Gen_1 x miRNA_1 combination, should consider the p-values of all combinations that include Gen_1 (Option 1), or should consider all the p-values of all the genes x miRNA combinations (Option 2)?

For example, let’s assume an expression dataset of 20,000 genes and another of 15,000 miRNAs

Option 1:

To adjust Gen_1 x miRNA_1 we would use 15,000 p-values (Gen_1 x miRNA_1, Gen_1 x miRNA_2, …, Gen_1 x miRNA_15000).

Option 2:

To adjust Gen_1 x miRNA_1 we would use 300,000,000 p-values (Gen_1 x miRNA_1, Gen_1 x miRNA_2, …, Gen_1 x miRNA_15000, Gen_2 x miRNA_1, Gen_2 x miRNA_2, …, Gen_2 x miRNA_15000 and so on).

## Suplementary question

Documentation of the method fdrcorrection from Python Statsmodels library suggests that for negative correlations (that could be frequent in a mRNA x miRNA correlation analysis) Benjamini-Yekutieli would work better; is that right? Or Benjamini-Hochberg method would be appropiated for this case?

Any kind of help would be much appreciated, thanks in advance!

Bioinformatics Asked on November 15, 2021

I made the same question in CrossValidated forum and got an excellent answer!

The important part:

You need to correct for all of the comparisons you are doing. So if that's 300,000,000 comparisons you need to correct for that many multiple comparisons.

Answered by Genarito on November 15, 2021

## Related Questions

### Details of DESeq2 modeling a batch effect

1  Asked on November 9, 2020

### How to identify to each scaffold a read belongs to, inside a .sam file?

1  Asked on November 5, 2020 by fullmooninu

### are GSEA and other geneset enrichment analysis supposed to yield extremely different results between them?

1  Asked on November 5, 2020 by daro-rocha

### Meta-analysis and data curation tools in R

0  Asked on November 4, 2020

### Viral Metagenomics

1  Asked on November 1, 2020 by l-r-joshi

### Issues with AutoDock Vina

0  Asked on October 18, 2020 by ibio_rep1

### Samtools Index: Chromosome Blocks not Continuous

1  Asked on October 17, 2020

### Plotting distance tree from blastn output

1  Asked on October 11, 2020

### Swapping to effect increasing allele in case/control studies

0  Asked on October 9, 2020 by dale-handley

### Error When Using biocLite as an installer in rpy2 python library

1  Asked on September 27, 2020 by abiologist

### Why doesn’t an (Entrez eutils) einfo request for “gene” return the link gene_nucleotide or gene_nucleotide_pos links?

1  Asked on September 10, 2020 by hepcat72

### Convert rs ID of one hg build to rs IDs of another build

4  Asked on September 1, 2020 by rob-john

### Does the kinship and inbreeding coefficients depend on population frequency of an allele?

2  Asked on August 31, 2020

### How do I include repeat purity, default slippage, default stutter, and minimum flanking (left and right) in Tandem Repeat Finder’s output?

0  Asked on August 30, 2020 by annabelperry

### calculating nucleotide frequency per column

7  Asked on August 19, 2020 by user3138373

### Seurat DE t.test

1  Asked on August 11, 2020 by vdu12345

### Number of reactions per metabolic pathway

0  Asked on August 11, 2020 by mmphysics

### Find all the bases for given reference position

0  Asked on August 8, 2020 by diesel__100

### Calculate the percentage of each unique phylogenetic tree in a BEAST output

2  Asked on August 8, 2020 by justine-vandendorpe

### parsimony and maximum likelihood tree comparison in R

2  Asked on August 5, 2020