Paper The following article is Open access

Ensemble-support vector machine-random undersampling: Simulation study of multiclass classification for handling high dimensional and imbalanced data

Published under licence by IOP Publishing Ltd
, , Citation Nur Silviyah Rahmi 2020 J. Phys.: Conf. Ser. 1613 012064 DOI 10.1088/1742-6596/1613/1/012064

1742-6596/1613/1/012064

Abstract

Microarray technology measures on a large and parallel scale to express tens of thousands of genes. It has widely applied to predict gene function, new subtypes of specific tumors and cancer classification. However, microarray data are known has feature characteristics such as high dimension, small sample, high noise, and imbalanced class distribution. Support Vector Machine (SVM) has been widely used and shows the success in major applications to improve classification performance. To overcome the high dimension, we applied the Ensemble-SVM method. This method classifies features use clustering hierarchy and each group will be classified. While the condition of imbalance data becomes a problem in classification because the classifier will tend to predict the majority class compared to the minority class. Therefore, a Random Undersampling or EnSVM-RUS method is used to balance the size of the majority class into the minority class. We uses threefold cross-validation with a feature selection method that is Fast Correlation Based Filter (FCBF). The multiclass method used is SVM One Against One (OAO). While the evaluation criteria of performance classification based on the value of accuracy, F-score and G-mean and running time. We performs a simulation study with various scenario level of ratio imbalance (IR) that is ratio 1, 5, and 8 to know the performance of the proposed method. While the application on real data using Microarray DNA data with IR 4.22, 15.00 and 23.17 The results showed that EnSVM-RUS-OAO method with 2 clusters had higher performance than the EnSVM-OAO and EnSVM-OAO methods. Increasing the ratio imbalance doesn't affect the advantage of the EnSVM-RUS-OAO method when compared to EnSVM-OAO and EnSVM-OAO methods. While on the use of the kernel, RBF kernel and polynomials produce higher performance and shorter computation time than linear kernels.

Export citation and abstract BibTeX RIS

Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

Please wait… references are loading.