Background:Detection of correlated gene expression is a fundamental process in the characterization of gene functions using microarray data. Commonly used methods such as the Pearson correlation can detect only a fraction of interactions between genes or their products. However, the performance of correlation analysis can be significantly improved either by providing additional biological information or by combining correlation with other techniques that can extract various mathematical or statistical properties of gene expression from microarray data. In this article, I will test the performance of three correlation methods-the Pearson correlation, the rank (Spearman) correlation, and the Mutual Information approach-in detection of protein-protein interactions, and I will further examine the properties of these techniques when they are used together. I will also develop a new correlation measure which can be used with other measures to improve predictive power.
Results:Using data from 5,896 microarray hybridizations, the three measures were obtained for 30,499 known protein-interacting pairs in the Human Protein Reference Database (HPRD). Pearson correlation showed the best sensitivity (0.305) but the three measures showed similar specificity (0.240 - 0.257). When the three measures were compared, it was found that better specificity could be obtained at a high Pearson coefficient combined with a low Spearman coefficient or Mutual Information. Using a toy model of two gene interactions, I found that such measure combinations were most likely to exist at stronger curvature. I therefore introduced a new measure, termed asymmetric correlation (AC), which directly quantifies the degree of curvature in the expression levels of two genes as a degree of asymmetry. I found that AC performed better than the other measures, particularly when high specificity was required. Moreover, a combination of AC with other measures significantly improved specificity and sensitivity, by up to 50%.
Conclusions: A combination of correlation measures, particularly AC and Pearson correlation, can improve prediction of protein-protein interactions. Further studies are required to assess the biological significance of asymmetry in expression patterns of gene pairs.