skip to main content
10.1145/1143844.1143865acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
Article

An empirical comparison of supervised learning algorithms

Published:25 June 2006Publication History

ABSTRACT

A number of supervised learning methods have been introduced in the last decade. Unfortunately, the last comprehensive empirical evaluation of supervised learning was the Statlog Project in the early 90's. We present a large-scale empirical comparison between ten supervised learning methods: SVMs, neural nets, logistic regression, naive bayes, memory-based learning, random forests, decision trees, bagged trees, boosted trees, and boosted stumps. We also examine the effect that calibrating the models via Platt Scaling and Isotonic Regression has on their performance. An important aspect of our study is the use of a variety of performance criteria to evaluate the learning methods.

References

  1. Ayer, M., Brunk, H., Ewing, G., Reid, W., & Silverman, E. (1955). An empirical distribution function for sampling with incomplete information. Annals of Mathematical Statistics, 5, 641--647.Google ScholarGoogle ScholarCross RefCross Ref
  2. Bauer, E., & Kohavi, R. (1999). An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning, 36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Blake, C., & Merz, C. (1998). UCI repository of machine learning databases.Google ScholarGoogle Scholar
  4. Breiman, L. (1996). Bagging predictors. Machine Learning, 24, 123--140. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Breiman, L. (2001). Random forests. Machine Learning, 45, 5--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Buntine, W., & Caruana, R. (1991). Introduction to ind and recursive partitioning (Technical Report FIA-91-28). NASA Ames Research Center.Google ScholarGoogle Scholar
  7. Caruana, R., & Niculescu-Mizil, A. (2004). Data mining in metric space: An empirical analysis of suppervised learning performance criteria. Knowledge Discovery and Data Mining (KDD'04). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Cooper, G. F., Aliferis, C. F., Ambrosino, R., Aronis, J., Buchanan, B. G., Caruana, R., Fine, M. J., Glymour, C., Gordon, G., Hanusa, B. H., Janosky, J. E., Meek, C., Mitchell, T., Richardson, T., & Spirtes, P. (1997). An evaluation of machine learning methods for predicting pneumonia mortality. Artificial Intelligence in Medicine, 9.Google ScholarGoogle Scholar
  9. Giudici, P. (2003). Applied data mining. New York: John Wiley and Sons.Google ScholarGoogle Scholar
  10. Gualtieri, A., Chettri, S. R., Cromp, R., & Johnson, L. (1999). Support vector machine classifiers as applied to aviris data. Proc. Eighth JPL Airborne Geoscience Workshop.Google ScholarGoogle Scholar
  11. Joachims, T. (1999). Making large-scale svm learning practical. Advances in Kernel Methods.Google ScholarGoogle Scholar
  12. King, R., Feng, C., & Shutherland, A. (1995). Statlog: comparison of classification algorithms on large real-world problems. Applied Artificial Intelligence, 9.Google ScholarGoogle Scholar
  13. LeCun, Y., Jackel, L. D., Bottou, L., Brunot, A., Cortes, C., Denker, J. S., Drucker, H., Guyon, I., Muller, U. A., Sackinger, E., Simard, P., & Vapnik, V. (1995). Comparison of learning algorithms for handwritten digit recognition. International Conference on Artificial Neural Networks (pp. 53--60). Paris: EC2 & Cie.Google ScholarGoogle Scholar
  14. Lim, T.-S., Loh, W.-Y., & Shih, Y.-S. (2000). A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Machine Learning, 40, 203--228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Niculescu-Mizil, A., & Caruana, R. (2005). Predicting good probabilities with supervised learning. Proc. 22nd International Conference on Machine Learning (ICML'05). Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Perlich, C., Provost, F., & Simonoff, J. S. (2003). Tree induction vs. logistic regression: a learning-curve analysis. J. Mach. Learn. Res., 4, 211--255. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Platt, J. (1999). Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. Adv. in Large Margin Classifiers.Google ScholarGoogle Scholar
  18. Provost, F., & Domingos, P. (2003). Tree induction for probability-based rankings. Machine Learning. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Provost, F. J., & Fawcett, T. (1997). Analysis and visualization of classifier performance: Comparison under imprecise class and cost distributions. Knowledge Discovery and Data Mining (pp. 43--48).Google ScholarGoogle Scholar
  20. Robertson, T., Wright, F., & Dykstra, R. (1988). Order restricted statistical inference. New York: John Wiley and Sons.Google ScholarGoogle Scholar
  21. Schapire, R. (2001). The boosting approach to machine learning: An overview. In MSRI Workshop on Nonlinear Estimation and Classification.Google ScholarGoogle Scholar
  22. Vapnik, V. (1998). Statistical learning theory. New York: John Wiley and Sons. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Witten, I. H., & Frank, E. (2005). Data mining: Practical machine learning tools and techniques. San Francisco: Morgan Kaufmann. Second edition. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Zadrozny, B., & Elkan, C. (2001). Obtaining calibrated probability estimates from decision trees and naive bayesian classifiers. ICML. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Zadrozny, B., & Elkan, C. (2002). Transforming classifier scores into accurate multiclass probability estimates. KDD. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. An empirical comparison of supervised learning algorithms

                  Recommendations

                  Comments

                  Login options

                  Check if you have access through your login credentials or your institution to get full access on this article.

                  Sign in
                  • Published in

                    cover image ACM Other conferences
                    ICML '06: Proceedings of the 23rd international conference on Machine learning
                    June 2006
                    1154 pages
                    ISBN:1595933832
                    DOI:10.1145/1143844

                    Copyright © 2006 ACM

                    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                    Publisher

                    Association for Computing Machinery

                    New York, NY, United States

                    Publication History

                    • Published: 25 June 2006

                    Permissions

                    Request permissions about this article.

                    Request Permissions

                    Check for updates

                    Qualifiers

                    • Article

                    Acceptance Rates

                    ICML '06 Paper Acceptance Rate140of548submissions,26%Overall Acceptance Rate140of548submissions,26%

                  PDF Format

                  View or Download as a PDF file.

                  PDF

                  eReader

                  View online with eReader.

                  eReader