Main Article Content

Robust Pearson correlation coefficient for imbalanced sample size and high dimensional data set


Friday Zinzendoff Okwonu
Owoyi Mildred Chiyeaka
Nor Aishah Ahad
Olimjon Sharipov

Abstract

Conventionally, datasets of practical applications often vary in terms of sample sizes and dimensions; for example, undersampling or  oversampling techniques are often applied to solve the minority sample size problems. However, formulating the Pearson correlation for  imbalanced sample size and high dimensional data poses impracticable challenges. This study addressed the imbalance sample size  problem and proposed a new method that could be used as a dual enabler to solve correlation problems for high dimensional data sets.  The mean variance cloning technique (MVCT) would be applied to solve the imbalance sample size problem and the absolute variance  variable selection technique (AVVS) would be applied as transpose enabler to enhance the computation of the Pearson correlation. This  study aimed at revealing how strong or weak the relationship of an imbalanced sample size and high dimensional data set between two  objects could be determined. The comparative results showed that the MVCT and the AVVS Pearson correlation demonstrated robust  performance for the imbalanced sample size and high dimensional data set. Therefore, the simulation results have shown that the two  preprocessing techniques (MVCT and AVVS) are enabler to enhance robust performance of the Pearson correlation. This study concluded  that the enhanced Pearson correlation coefficient (AVVS-PCC, MVCT-AVVS-PCC, MVCT-PCC) indicated robust association and potentially  suitable to perform different practical tasks that are aimed at solving complex practical problems. 


Journal Identifiers


eISSN: 1597-6343
print ISSN: 2756-391X