A Study on Gene Selection and Classification Algorithms for Classification of Microarray Gene Expression Data

Authors

  • Lee Chin Yeo
  • Safaai Deris

DOI:

https://doi.org/10.11113/jt.v43.780

Abstract

Pembangunan teknologi microarray membenarkan penyelidik untuk meneliti tahap ekspresi gen dalam sel. Salah satu aplikasi teknologi microarray adalah pengkelasan sampel tisu kepada tisu kanser atau tisu biasa. Pemilihan gen memainkan peranan yang penting sebelum pengkelasan. Dalam makalah ini, beberapa kombinasi teknik pemilihan gen dan teknik pengkelasan yang berlainan untuk pengkelasan data expresi gen microarray telah dikaji. Teknik pemilihan gen terdiri dari Fisher Criterion, Golub Signal–to–Noise, traditional t–test dan Mann–Whitney rank sum statistic. Teknik pengkelasan terdiri dari support vector machines (SVMs) dengan pelbagai kernel dan k–nearest neighoor (k–nn). Prestasi kombinasi teknik–teknik yang dikaji disahkan dengan menggunakan teknik leave–one–out cross validation (LOOCV) dan receiver operating characteristic (ROC) digunakan untuk menganalisa prestasi kombinasi teknik–teknik yang dikaji. Kajian yang telah dijalankan dalam eksperimen ini menunjukkan bahawa pemilihan gen sebelum pengkelasan adalah penting untuk memperolehi prestasi pengkelasan yang lebih baik. Kombinasi yang menghasilkan prestasi tertinggi adalah dengan menggunakan Mann–Whitney rank sum statistic dan SVMs. Nilai ROC tertinggi yang dicapai oleh kombinasi ini adalah 0.91. Ini adalah penting bagi tujuan rawatan dan kajian biologi seterusnya. Kata kunci: Data expresi gen microarray, pemilihan gen, kaedah statistik, algoritma pengkelasan, Support Vector Machines, k-nearest neighbor The development of microarray technology allows researchers to monitor the expression of genes on a genomic scale. One of the main applications of microarray technology is the classification of tissue samples into tumor or normal tissue. Gene selection plays an important role prior to tissue classification. In this paper, a study on numerous combinations of gene selection techniques and classifcation algorithms for classification of microarray gene expression data is presented. The gene selection techniques include Fisher Criterion, Golub Signal–to–Noise, traditional t–test and Mann–whitney rank sum statistic. The classification algorithms include support vector machines (SVMs) with several kernels and k–nearest neighbor (k–nn). The performance of the combined techniques is validated by using leave–one–out cross validabon (LOOCV) technique and receiver operating characteristic (ROC) is used to analyze the results. The study demonstrated that selecting genes prior to tissue classification plays an important role for a better classification performance. The best combination is obtained by using Mann–Whitney Rank Sum Statistic and SVMs. The best ROC score achieved for this combination is at 0.91. This should be of significant value for diagnostic purposes as well as for guiding further exploration of the underlying biology. Key words: Microarray gene expression data, gene selection, statistical methods, classification algorithms, support vector machines, k-nearest neighbor

Downloads

Published

2012-02-29

Issue

Section

Science and Engineering

How to Cite

A Study on Gene Selection and Classification Algorithms for Classification of Microarray Gene Expression Data. (2012). Jurnal Teknologi, 43(1), 111–124. https://doi.org/10.11113/jt.v43.780