Benchmarking Swarm Rebalancing Algorithm for Relieving Imbalanced Machine Learning Problems

Li, Jinyan; Fong, Simon

doi:10.1007/978-3-319-76430-6_1

Jinyan Li^5,6 &
Simon Fong^5,6

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

491 Accesses

Abstract

Imbalanced classification is a well-known NP-hard problem in data mining. Since there are more data from the majority classes than the minorities in imbalanced dataset, the resultant classifier would become over-fitted to the former and under-fitted to the latter. Previous solutions focus on increasing the learning sensitivity to the minorities and/or rebalancing sample sizes before learning. Using swarm intelligence algorithm, we propose a series of unified pre-processing approaches to address imbalanced classification problem. These methods used stochastic swarm heuristics to cooperatively optimize and fuse the distribution of an imbalanced training dataset. Foremost, as shown in our published paper, this series of algorithms indeed have an edge in relieving imbalanced problem. In this book chapter we take an in-depth and thorough evaluation of the performances of the contemporary swarm rebalancing algorithms. Through the experimental results, we observe that the proposed algorithms overcome the current 17 comparative algorithms. Though some are better than the others, in general these algorithm exhibit superior computational speed, high accuracy and acceptable reliability of classification model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://www.simonjamesfong.net/publications.

References

Brown, I. and C. Mues, An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Systems with Applications, 2012. 39(3): p. 3446–3453.
Article Google Scholar
Amin, A., et al., Comparing oversampling techniques to handle the class imbalance problem: A customer churn prediction case study. IEEE Access, 2016. 4: p. 7940–7957.
Article Google Scholar
Li, J., et al., Solving the under-fitting problem for decision tree algorithms by incremental swarm optimization in rare-event healthcare classification. Journal of Medical Imaging and Health Informatics, 2016. 6(4): p. 1102–1110.
Article Google Scholar
Sun, A., E.-P. Lim, and Y. Liu, On strategies for imbalanced text classification using SVM: A comparative study. Decision Support Systems, 2009. 48(1): p. 191–201.
Article Google Scholar
Kubat, M., R.C. Holte, and S. Matwin, Machine learning for the detection of oil spills in satellite radar images. Machine learning, 1998. 30(2–3): p. 195–215.
Article Google Scholar
Jinyan, L., F. Simon, and Y. Xin-She, Solving imbalanced dataset problems for high-dimensional image processing by swarm optimization, in Bio-Inspired Computation and Applications in Image Processing. 2016, ELSEVIER. p. 311–321.
Google Scholar
Li, J., et al., Improving the classification performance of biological imbalanced datasets by swarm optimization algorithms. The Journal of Supercomputing, 2016. 72(10): p. 3708–3728.
Article Google Scholar
Quinlan, J.R. Bagging, boosting, and C4. 5. in AAAI/IAAI, Vol. 1. 1996.
Google Scholar
Fan, W., et al. AdaCost: misclassification cost-sensitive boosting. in Icml. 1999.
Google Scholar
Seiffert, C., et al., RUSBoost: A hybrid approach to alleviating class imbalance. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 2010. 40(1): p. 185–197.
Article Google Scholar
Chen, C., A. Liaw, and L. Breiman, Using random forest to learn imbalanced data. University of California, Berkeley, 2004. 110.
Google Scholar
Chawla, N.V., et al., SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 2002. 16: p. 321–357.
Google Scholar
Li, J., S. Fong, and Y. Zhuang. Optimizing SMOTE by metaheuristics with neural network and decision tree. in Computational and Business Intelligence (ISCBI), 2015 3rd International Symposium on. 2015. IEEE.
Google Scholar
Hu, S., et al. MSMOTE: improving classification performance when training data is imbalanced. in Computer Science and Engineering, 2009. WCSE'09. Second International Workshop on. 2009. IEEE.
Google Scholar
Chawla, N.V., et al. SMOTEBoost: Improving prediction of the minority class in boosting. in European Conference on Principles of Data Mining and Knowledge Discovery. 2003. Springer.
Chapter Google Scholar
Kotsiantis, S., D. Kanellopoulos, and P. Pintelas, Handling imbalanced datasets: A review. GESTS International Transactions on Computer Science and Engineering, 2006. 30(1): p. 25–36.
Google Scholar
Tomek, I., An experiment with the edited nearest-neighbor rule.IEEE Transactions on systems, Man, and Cybernetics, 1976(6): p. 448–452.
Google Scholar
Bekkar, M. and T.A. Alitouche, Imbalanced data learning approaches review.International Journal of Data Mining & Knowledge Management Process, 2013. 3(4): p. 15.
Article Google Scholar
He, H. and E.A. Garcia, Learning from imbalanced data. IEEE Transactions on knowledge and data engineering, 2009. 21(9): p. 1263–1284.
Article Google Scholar
Tang, Y., et al., SVMs modeling for highly imbalanced classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2009. 39(1): p. 281–288.
Article Google Scholar
Li, J., et al., Adaptive multi-objective swarm fusion for imbalanced data classification. Information Fusion, 2018. 39: p. 1–24.
Article Google Scholar
Nikolaou, N., et al., Cost-sensitive boosting algorithms: Do we really need them? Machine Learning, 2016. 104(2–3): p. 359–384.
Article Google Scholar
Li, J., et al. Adaptive Multi-objective Swarm Crossover Optimization for Imbalanced Data Classification. in Advanced Data Mining and Applications: 12th International Conference, ADMA 2016, Gold Coast, QLD, Australia, December 12-15, 2016, Proceedings 12. 2016. Springer.
Chapter Google Scholar
Viera, A.J. and J.M. Garrett, Understanding interobserver agreement: the kappa statistic. Fam Med, 2005. 37(5): p. 360–363.
Google Scholar
Chen, Y.-W. and C.-J. Lin, Combining SVMs with various feature selection strategies, in Feature extraction. 2006, Springer. p. 315–324.
Google Scholar
Stone, E.A., Predictor performance with stratified data and imbalanced classes. Nature methods, 2014. 11(8): p. 782.
Article Google Scholar
Tan, S., Neighbor-weighted k-nearest neighbor for unbalanced text corpus. Expert Systems with Applications, 2005. 28(4): p. 667–671.
Article Google Scholar
Maratea, A., A. Petrosino, and M. Manzo, Adjusted F-measure and kernel scaling for imbalanced data learning. Information Sciences, 2014. 257: p. 331–341.
Article Google Scholar
Chawla, N.V. C4. 5 and imbalanced data sets: investigating the effect of sampling method, probabilistic estimate, and decision tree structure. in Proceedings of the ICML. 2003.
Google Scholar
Poli, R., J. Kennedy, and T. Blackwell, Particle swarm optimization. Swarm intelligence, 2007. 1(1): p. 33–57.
Article Google Scholar
Kohavi, R. and G.H. John, Wrappers for feature subset selection. Artificial intelligence, 1997. 97(1–2): p. 273–324.
Article Google Scholar
Fonseca, C.M. and P.J. Fleming, Multiobjective optimization and multiple constraint handling with evolutionary algorithms. I. A unified formulation. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 1998. 28(1): p. 26–37.
Article Google Scholar
Li, X. and S. Ma, Multi-objective memetic search algorithm for multi-objective permutation flow shop scheduling problem. IEEE Access, 2016. 4: p. 2154–2165.
Article Google Scholar
Landis, J.R. and G.G. Koch, The measurement of observer agreement for categorical data. biometrics, 1977: p. 159–174.
Article Google Scholar
Fong, S., et al., Feature selection in life science classification: metaheuristic swarm search. IT Professional, 2014. 16(4): p. 24–29.
Article Google Scholar
Li, J., et al., Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification. BioData Mining, 2016. 9(1): p. 37.
Article Google Scholar
Blumer, A., et al., Occam's razor. Information processing letters, 1987. 24(6): p. 377–380.
Article Google Scholar
Bifet, A., et al., Moa: Massive online analysis. Journal of Machine Learning Research, 2010. 11(May): p. 1601–1604.
Google Scholar
He, H., et al. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. in Neural Networks, 2008. IJCNN 2008.(IEEE World Congress on Computational Intelligence). IEEE International Joint Conference on. 2008. IEEE.
Google Scholar
Liu, X.-Y., J. Wu, and Z.-H. Zhou, Exploratory undersampling for class-imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2009. 39(2): p. 539–550.
Article Google Scholar

Download references

Acknowledgement

The authors are thankful to the financial support from the research grants, (1) MYRG2016-00069, titled ‘Nature-Inspired Computing and Metaheuristics Algorithms for Optimizing Data Mining Performance’ offered by RDAO/FST, University of Macau and Macau SAR government. (2) FDCT/126/2014/A3, titled ‘A Scalable Data Stream Mining Methodology: Stream-based Holistic Analytics and Reasoning in Parallel’ offered by FDCT of Macau SAR government.

Author information

Authors and Affiliations

Big Data PDU, Huawei Software Theologies, CO.LTD, Nanjing, China
Jinyan Li & Simon Fong
Department of Computer and Information Science, University of Macau, Taipa, Macau SAR, China
Jinyan Li & Simon Fong

Authors

Jinyan Li
View author publications
You can also search for this author in PubMed Google Scholar
Simon Fong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Simon Fong .

Editor information

Editors and Affiliations

School of Computer Science and Engineering, University of New South Wales, Sydney, New South Wales, Australia
Raymond Wong
CSIRO, Hobart, Tasmania, Australia
Chi-Hung Chi
Faculty of Business and Information Technology, University of Ontario Institute of Technology, Oshawa, Ontario, Canada
Patrick C. K. Hung

Appendix

Table 4 BER of different methods with different dataset

Full size table

Table 5 G-mean of different methods with different dataset

Full size table

Table 6 MCC of different methods with different dataset

Full size table

Table 7 Precision of different methods with different dataset

Full size table

Table 8 Recall of different methods with different dataset

Full size table

Table 9 F-measure of different methods with different dataset

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Li, J., Fong, S. (2018). Benchmarking Swarm Rebalancing Algorithm for Relieving Imbalanced Machine Learning Problems. In: Wong, R., Chi, CH., Hung, P. (eds) Behavior Engineering and Applications. International Series on Computer Entertainment and Media Technology. Springer, Cham. https://doi.org/10.1007/978-3-319-76430-6_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-76430-6_1
Published: 11 July 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-76429-0
Online ISBN: 978-3-319-76430-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Benchmarking Swarm Rebalancing Algorithm for Relieving Imbalanced Machine Learning Problems

Abstract

Access this chapter

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation