Skip to main content
Log in

Lung Cancer Classification Models Using Discriminant Information of Mutated Genes in Protein Amino Acids Sequences

  • Research Article - Computer Engineering and Computer Science
  • Published:
Arabian Journal for Science and Engineering Aims and scope Submit manuscript

Abstract

Lung cancer is a heterogeneous disease based on uncontrollable growth of cells. Lung cancer is major cause of cancer-related deaths. Early diagnosis of lung cancer is important for its treatment and survival of patients. In this study, through the statistical analysis of cancerous proteins sequences, we observed the mutated genes associated with etiology of lung cancer. Our analysis revealed most frequent mutated genes TP53, EGFR, KMT2D, PDE4DIP, ATM, ZNF521, DICER1, CTNNB1 RUNX1T1, SMARCA4, FBXW7, NF1, PIK3CA, STK11, NTRk3, APC, PTPRB, BRCA2, MYH11 and AMER1. We observed abnormal mutations in genes contributed toward variations in the composition of amino acid sequences. This variation was described in various feature spaces using statistical and physicochemical properties of amino acids. These influential features have provided sufficient discrimination power for the development of effective lung cancer classification models (LCCMs). The main advantage of proposed novel approach is the effective utilization of the discriminant information of mutated genes. Experimental results showed that SVM model has the best performance in split amino acid composition. In the study, we explored a new dimension of early lung cancer classification using discriminant information of mutated genes revealed through the statistical analysis of the mutated genes. It is anticipated that the proposed approach would be useful for practitioners and domain experts for early lung cancer diagnosis and prognosis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Torre, L.A.; Siegel, R.L.; Ward, E.M.; Jemal, A.: Global cancer incidence and mortality rates and trends: an update. Cancer Epidemiol. Biomark. Prev. 25(1), 16–27 (2016)

    Article  Google Scholar 

  2. Stoppler, M.C.: Lung cancer facts. https://www.medicinenet.com/lung_cancer/article.htm#lung_cancer_facts. Accessed 10 Jan 2018

  3. Stoppler, M.C.: Causes of lung cancer in non-smokers. https://www.medicinenet.com/script/main/art.asp?articlekey=53012. Accessed 11 Jan. 2018

  4. Siegel, R.L.; Miller, K.D.; Jemal, A.: Cancer statistics, 2018. CA Cancer J. Clin. 68(1), 7–30 (2018)

    Article  Google Scholar 

  5. Luqman, M.; Javed, M.M.; Daud, S.; Raheem, N.; Ahmad, J.; Khan, A.-U.-H.: Risk factors for lung cancer in the Pakistani population. Asia Pac. J. Cancer Prev. 15(7), 3035–3039 (2014)

    Article  Google Scholar 

  6. Gilad, S.; Lithwick-Yanai, G.; Barshack, I.; Benjamin, S.; Krivitsky, I.; Edmonston, T.B.; Bibbo, M.; Thurm, C.; Horowitz, L.; Huang, Y.; Feinmesser, M.; Steve Hou, J.; Cyr, B.; Burnstein, I.; Gibori, H.; Dromi, N.; Sanden, M.; Kushnir, M.; Aharonov, R.: Classification of the four main types of lung cancer using a microRNA-based diagnostic assay. J. Mol. Diagn. 14(5), 510–517 (2012)

    Article  Google Scholar 

  7. Lee, K.J.; Lee, J.H.; Chung, H.K.; Choi, J.; Park, J.; Park, S.S.; Ju, E.J.; Park, J.; Shin, S.H.; Park, H.J.; Ko, E.J.; Suh, N.; Kim, I.; Hwang, J.J.; Song, S.Y.; Jeong, S.-Y.; Choi, E.K.: Novel peptides functionally targeting in vivo human lung cancer discovered by in vivo peptide displayed phage screening. Amino Acids 47(2), 281–289 (2015)

    Article  Google Scholar 

  8. Cheung, C.H.Y.; Juan, H.: Quantitative proteomics in lung cancer. J. Biomed. Sci. 24(1), 37–47 (2017)

    Article  Google Scholar 

  9. Detterbeck, F.C.; Boffa, D.J.; Kim, A.W.; Tanoue, L.T.: The eighth edition lung cancer stage classification. Chest 151(1), 193–203 (2017)

    Article  Google Scholar 

  10. Consortium, T.U.: UniProt: the universal protein knowledgebase. Nucleic Acids Res. 45(D1), D158–D169 (2017)

    Article  Google Scholar 

  11. Fraser, A.: Essential human genes. Cell Syst. 1(6), 381–382 (2015)

    Article  Google Scholar 

  12. Dela-Cruz, C.S.; Tanoue, L.T.; Matthay, R.A.: Lung cancer: epidemiology, etiology, and prevention. Clin. Chest Med. 32(4), 605–644 (2011)

    Article  Google Scholar 

  13. Ho, V.; Parent, M.-E.; Pintos, J.; Abrahamowicz, M.; Danieli, C.; Richardson, L.; Bourbonnais, R.; Gauvin, L.; Siemiatycki, J.; Koushik, A.: Physical activity and lung cancer risk in men and women. Cancer Causes Control 28(4), 309–318 (2017)

    Article  Google Scholar 

  14. Halvorsen, A.R.; Silwal-Pandit, L.; Meza-Zepeda, L.A.; Vodak, D.; Vu, P.; Sagerup, C.; Hovig, E.; Myklebost, O.; Børresen-Dale, A.-L.; Brustugun, O.T.; Helland, Å.: TP53 mutation spectrum in smokers and never smoking lung cancer patients. Front. Genet. 7, 85 (2016). https://doi.org/10.3389/fgene.2016.00085

    Article  Google Scholar 

  15. Forbes, S.A.; Beare, D.; Boutselakis, H.; Bamford, S.; Bindal, N.; Tate, J.; Cole, C.G.; Ward, S.; Dawson, E.; Ponting, L.; Stefancsik, R.; Harsha, B.; Kok, C.Y.; Jia, M.; Jubb, H.; Sondka, Z.; Thompson, S.; De, T.; Campbell, P.J.: COSMIC: somatic cancer genetics at high-resolution (2017). https://doi.org/10.1093/nar/gkw1121

  16. NIH: TCGA: The Cancer Genome Atalas. https://cancergenome.nih.gov. Accesses 25 Sept. 2017

  17. Augert, A.; Zhang, Q.; Bates, B.; Cui, M.; Wang, X.; Wildey, G.; Dowlati, A.; MacPherson, D.: Small cell lung cancer exhibits frequent inactivating mutations in the histone methyltransferase KMT2D/MLL2: CALGB 151111 (Alliance). J. Thorac. Oncol. 12(4), 704–713 (2017)

    Article  Google Scholar 

  18. Ramani, R.G.; Jacob, S.G.: Improved classification of lung cancer tumors based on structural and physicochemical properties of proteins using data mining models. PLoS ONE 8(3), e58772 (2013). https://doi.org/10.1371/journal.pone.0058772

    Article  Google Scholar 

  19. Hosseinzadeh, F.; KayvanJoo, A.H.; Ebrahimi, M.; Goliaei, B.: Prediction of lung tumor types based on protein attributes by machine learning algorithms. SpringerPlus 2, 238 (2013). https://doi.org/10.1186/2193-1801-2-238

    Article  Google Scholar 

  20. Li, J.; Ching, T.; Huang, S.; Garmire, L.X.: Using epigenomics data to predict gene expression in lung cancer. BMC Bioinform. 16(5), 5–10 (2015)

    Google Scholar 

  21. Zhang, Y.; Elgizouli, M.; Schöttker, B.; Holleczek, B.; Nieters, A.; Brenner, H.: Smoking-associated DNA methylation markers predict lung cancer incidence. Clin. Epigenetics 8, 127 (2016). https://doi.org/10.1186/s13148-016-0292-4

    Article  Google Scholar 

  22. Salim, A.; Amjesh, R.; Vinod, C.S.S.: SVM based lung cancer prediction using microRNA expression profiling from NGS data. Paper Presented at the Asian Conference on Intelligent Information and Database Systems, vol. 38, pp. 599–609 (2016)

  23. Velazquez, E.R.; Parmar, C.; Liu, Y.; Coroller, T.P.; Cruz, G.; Stringfield, O.; Ye, Z.; Makrigiorgos, M.; Fennessy, F.; Mak, R.H.; Gillies, R.; Quackenbush, J.; Aerts, H.J.W.L.: Somatic mutations drive distinct imaging phenotypes in lung cancer. Cancer Res. 77(14), 3922–3930 (2017)

    Article  Google Scholar 

  24. Ji-Yeon, Y.; Yoshihara, K.; Tanaka, K.; Hatae, M.; Masuzaki, H.; Itamochi, H.; Takano, M.; Ushijima, K.; Tanyi, J.L.; Coukos, G.; Lu, Y.; Mills, G.B.; Verhaak, R.G.W.: Predicting time to ovarian carcinoma recurrence using protein markers. J. Clin. Invest. 123(9), 3740–3750 (2013)

    Google Scholar 

  25. Ali, S.; Majid, A.: Can-Evo-Ens: classifier stacking based evolutionary ensemble system for prediction of human breast cancer using amino acid sequences. J. Biomed. Inform. 54, 256–269 (2015)

    Article  Google Scholar 

  26. Munteanu, C.R.; Magalhães, A.L.; Uriarte, E.; González-Díaz, H.: Multi-target QPDR classification model for human breast and colon cancer-related proteins using star graph topological indices. J. Theor. Biol. 257, 303–311 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  27. Ali, S.; Majid, A.; Khan, A.: IDM-PhyChm-Ens: intelligent decision-making ensemble methodology for classification of human breast cancer using physicochemical properties of amino acids. Amino Acids 46(4), 977–993 (2014)

    Article  Google Scholar 

  28. Robertson, W.W.; Steliga, M.A.; Siegel, E.R.; Arnaoutakis, K.: Accuracy of fine needle aspiration and core lung biopsies to predict histology in patients with non-small cell lung cancer. Med. Oncol. 31(6), 967 (2014). https://doi.org/10.1007/s12032-014-0967-7

    Article  Google Scholar 

  29. Online Mendelian Inheritance in Man (OMIM). Johns Hopkins University, Baltimore. https://www.omim.org/. Accessed October 10 (2017)

  30. Smedley, D.; Haider, S.; Ballester, B.; Holland, R.; London, D.; Thorisson, G.; Kasprzyk, A.: BioMart: biological queries made easy. BMC Genom. 10(1), 22 (2009). https://doi.org/10.1186/1471-2164-10-22

    Article  Google Scholar 

  31. Zerbino, D.R.; Achuthan, P.; Akanni, W.; Amode, M.R.; Barrell, D.; Bhai, J.; Billis, K.; Cummins, C.; Gall, A.; Girón, C.G.; Gil, L.; Gordon, L.; Haggerty, L.; Haskell, E.; Hourlier, T.; Izuogu, O.G.; Janacek, S.H.; Juettemann, T.; To, J.K.; Laird, M.R.; Lavidas, I.; Liu, Z.; Loveland, J.E.; Maurel, T.; McLaren, W.; Moore, B.; Mudge, J.; Murphy, D.N.; Newman, V.; Nuhn, M.; Ogeh, D.; Ong, C.K.; Parker, A.; Patricio, M.; Riat, H.S.; Schuilenburg, H.; Sheppard, D.; Sparrow, H.; Taylor, K.; Thormann, A.; Vullo, A.; Walts, B.; Zadissa, A.; Frankish, A.; Hunt, S.E.; Kostadima, M.; Langridge, N.; Martin, F.J.; Muffato, M.; Perry, E.; Ruffier, M.; Staines, D.M.; Trevanion, S.J.; Aken, B.L.; Cunningham, F.; Yates, A.; Flicek, P.: Ensembl 2018. Nucleic Acids Res. 46(D1), D754–D761 (2018). https://doi.org/10.1093/nar/gkx1098

    Article  Google Scholar 

  32. Mirza, M.T.; Khan, A.; Tahir, M.; Lee, Y.S.: MitProt-Pred: predicting mitochondrial proteins of plasmodium falciparum parasite using diverse physiochemical properties and ensemble classification. Comput. Biol. Med. 43(10), 1502–1511 (2013)

    Article  Google Scholar 

  33. Chen, C.; Zhou, X.; Tian, Y.; Zou, X.; Cai, P.: Predicting protein structural class with pseudo-amino acid composition and support vector machine fusion network. Anal. Biochem. 357, 116–121 (2006)

    Article  Google Scholar 

  34. Limongelli, I.; Marini, S.; Bellazzi, R.: PaPI: pseudo amino acid composition to score human protein-coding variants. BMC Bioinform. 16, 123 (2015). https://doi.org/10.1186/s12859-015-0554-8

    Article  Google Scholar 

  35. Chou, K.C.; Zhang, C.T.: Prediction of protein structural classes. Crit. Rev. Biochem. Mol. Biol. 30(4), 275–349 (1995)

    Article  Google Scholar 

  36. Sugiyama, M.: Introduction to Statistical Machine Learning, pp. 237–244. Morgan Kaufmann, Boston (2016)

    Book  MATH  Google Scholar 

  37. Theodoridis, S.: Machine Learning: A Bayesian and Optimization Prospective. Elsevier, Hoboken (2015)

    Google Scholar 

  38. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Berlin (1999)

    MATH  Google Scholar 

  39. Duda, R.O.; Hart, P.E.; Stork, D.G.: Pattern Classification, 2nd edn. Wiley, Hoboken (2000)

    MATH  Google Scholar 

  40. Python Software Foundation. https://www.python.org/. Accessed June 2017

  41. Jiao, Y.; Du, P.: Performance measures in evaluating machine learning based bioinformatics predictors for classifications. Quant. Biol. 4(4), 320–330 (2016)

    Article  Google Scholar 

  42. Tom, F.: ROC graphs: notes and practical considerations for researchers. Mach. Learn. 31, 1–38 (2004)

    Google Scholar 

  43. Kuijjer, M.L.; Paulson, J.N.; Salzman, P.; Ding, W.; Quackenbush, J.: Cancer subtype identification using somatic mutation data. Br. J. Cancer 118, 1492–1501 (2018)

    Article  Google Scholar 

  44. Weng, T.-Y.; Wang, C.-Y.; Hung, Y.-H.; Chen, W.-C.; Chen, Y.-L.; Lai, M.-D.: Differential expression pattern of THBS1 and THBS2 in lung cancer: clinical outcome and a systematic-analysis of microarray databases. PLoS ONE 11(8), e0161007 (2016). https://doi.org/10.1371/journal.pone.0161007

    Article  Google Scholar 

  45. Liu, J.X.; Gao, Y.L.; Xu, Y.; Zheng, C.H.; You, J.: Differential expression analysis on RNA-seq count data based on penalized matrix decomposition. IEEE Trans. Nanobiosci. 13(1), 12–18 (2014)

    Article  Google Scholar 

  46. Liu, J.-X.; Wang, Y.-T.; Zheng, C.-H.; Sha, W.; Mi, J.-X.; Xu, Y.: Robust PCA based method for discovering differentially expressed genes. BMC Bioinform. 14(8), S3 (2013). https://doi.org/10.1186/1471-2105-14-s8-s3

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abdul Majid.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (docx 41 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sattar, M., Majid, A. Lung Cancer Classification Models Using Discriminant Information of Mutated Genes in Protein Amino Acids Sequences. Arab J Sci Eng 44, 3197–3211 (2019). https://doi.org/10.1007/s13369-018-3468-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13369-018-3468-8

Keywords

Navigation