Skip to main content
Log in

Reusable components in decision tree induction algorithms

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

We propose a generic decision tree framework that supports reusable components design. The proposed generic decision tree framework consists of several sub-problems which were recognized by analyzing well-known decision tree induction algorithms, namely ID3, C4.5, CART, CHAID, QUEST, GUIDE, CRUISE, and CTREE. We identified reusable components in these algorithms as well as in several of their partial improvements that can be used as solutions for sub-problems in the generic decision tree framework. The identified components can now be used outside the algorithm they originate from. Combining reusable components allows the replication of original algorithms, their modification but also the creation of new decision tree induction algorithms. Every original algorithm can outperform other algorithms under specific conditions but can also perform poorly when these conditions change. Reusable components allow exchanging of solutions from various algorithms and fast design of new algorithms. We offer a generic framework for component-based algorithms design that enhances understanding, testing and usability of decision tree algorithm parts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Adams M, Coplien J, Gamoke R et al (1991) Fault-tolerant telecommunication system patterns. In: Rising L (ed.) The pattern handbook: techniques, strategies, and applications. Cambridge University Press, New York, pp 189–202

    Google Scholar 

  • Alexander C (1979) The timeless way of building. Oxford University Press, New York

    Google Scholar 

  • Attneave F (1959) Application of information theory to psychology. Holt, Rinehart and Winston, New York

    Google Scholar 

  • Breiman L, Friedman J, Stone CJ et al (1984) Classification and regression trees. CRC Press, Boca Raton

    MATH  Google Scholar 

  • Breiman L (1996) Technical note some properties of splitting criteria. Mach Learn 24: 41–47

    MathSciNet  MATH  Google Scholar 

  • Coplien JO, Schmidt DC (1995) Pattern languages of program design. Addison-Wesley Professional, Reading

    Google Scholar 

  • Coplien JO, Harrison NB (2005) Organizational patterns of agile software development. Prentice Hall PTR, Upper Saddle River

    Google Scholar 

  • Delibasic B, Kirchner K, Ruhland J et al (2008) A pattern-based data mining approach. In: Preisach C, Burckhardt H, Schmidt-Thieme L (eds) Data analysis, machine learning and applications. Springer, Berlin 327, pp 327–334

    Chapter  Google Scholar 

  • Delibasic B, Kirchner K, Ruhland J, Jovanovic M, Vukicevic M (2009) Reusable components for partitioning clustering algorithms. Art Int Rev 32: 59–75

    Article  Google Scholar 

  • Delibasic B, Jovanovic M, Vukicevic M, Suknovic M, Obradovic Z (2010) Component-based decision trees for classification. Intell Data Anal 15(5): accepted for publication

  • Drossos N, Papagelis A, Kalles D (2000) Decision tree toolkit: a component-based library of decision tree algorithms. In: Zighed DZ, Komorowski J, Zytkow J (eds) Principles of data mining and knowledge discovery. Springer, Berlin 121, pp 121–150

    Google Scholar 

  • Fayyad UM, Irani KB (1992) On the handling of continuous-valued attributes in decision tree generation. Mach Learn 8: 87–102

    MATH  Google Scholar 

  • Ferri C, Flach P, Hernandez-Orallo J (2002) Learning decision trees using the area under the ROC curve. In: Sammut C, Hoffmann A (eds), Proceedings of the 19th international conference on machine learning, Morgan Kaufmann, pp 139–146

  • Freeman P (1983) Reusable software engineering: Concepts and research directions. In: Workshop on reusability in programming, ITT Programming, Stratford, Conn., pp 2–16

  • Gamma E, Helm R, Johnson R et al (1995) Design patterns: elements of reusable object-oriented software. Addison, Reading

    Google Scholar 

  • Gnanadesikan R (1977) Methods for statistical data analysis of multivariate observations. Wiley, New York

    MATH  Google Scholar 

  • Hartigan JA, Wong MA (1979) Algorithm 136. A k-means clustering algorithm. Appl Stat 28: 100

    Article  MATH  Google Scholar 

  • Hothorn T, Hornik K, Zeileis A (2006) Unbiased recursive partitioning: a conditional inference framework. J Comput Gr Stat 15(3): 651–674

    Article  MathSciNet  Google Scholar 

  • Kass GV (1980) An exploratory technique for investigating large quantities of categorical data. Appl Stat 29: 119–127

    Article  Google Scholar 

  • Kim H, Loh WY (2001) Classification trees with unbiased multiway splits. J Am Stat Assoc 96: 598–604

    MathSciNet  Google Scholar 

  • Lea D (1994) acs.pdf. In: Design patterns for avionics control systems. http://gee.cs.oswego.edu/dl/acs/acs.pdf

  • Lim TS, Loh WY, Shih YS (2000) A comparison of prediction accuracy, complexity, and training time of thirty-three old and new algorithms. Mach Learn 40: 203–228

    Article  MATH  Google Scholar 

  • Loh WY, Vanichsetakul N (1988) Tree-structured classification via generalized discriminant analysis (with discussion). J Am Stat Assoc 83: 715–728

    Article  MathSciNet  MATH  Google Scholar 

  • Loh WY, Shih YS (1997) Split selection methods for classification trees. Stat Sin 7: 815–840

    MathSciNet  MATH  Google Scholar 

  • Loh WY (2002) Regression trees with unbiased variable selection and interaction detection. Stat Sin 12: 361–386

    MathSciNet  MATH  Google Scholar 

  • Malerba D, Esposito F, Semeraro G (1996) A further comparison of simplification methods for decision tree induction. In: Fisher D, Lenz HJ (eds) Learning from data: AI and statistics V. Springer, Berlin 365, pp 365–374

    Google Scholar 

  • Mantaras RL (1991) A distance-based attribute selection measure for decision tree induction. Mach Learn 6: 81–92

    Article  Google Scholar 

  • Mierswa I, Wurst M, Klinkenberg R, Scholz M, Euler T (2006) YALE: Rapid Prototyping for Complex Data Mining Tasks. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, pp 935–940

  • Mingers J (1989) An empirical comparison of pruning methods for decision tree induction. Mach Learn 4: 227–243

    Article  Google Scholar 

  • Murthy SK (1998) Automatic construction of decision trees from data: A multi-disciplinary survey. Data Min Knowl Discov. doi:10.1023/A:1009744630224

  • Quinlan JR (1986) Induction of decision trees. Mach Learn 1: 81–106

    Google Scholar 

  • Quinlan JR (1987) Simplifying decision trees. Int J Man Mach Stud 27: 221–234

    Article  Google Scholar 

  • Quinlan JR (1993) C4.5 Programs for machine learning. Morgan Kaufmann

  • R Development Core Team: (2008) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna

    Google Scholar 

  • Rokach L, Maimon O (2008) Data mining with decision trees—theory and application. World Scientific Publishing, London

    Google Scholar 

  • Safavian SR, Landgrebe D (1991) A survey of decision tree classifier methodology. IEEE Trans Syst Man Cybern 21: 660–674

    Article  MathSciNet  Google Scholar 

  • Shannon CE (1948) shannon1948.pdf. A mathematical theory of communication. http://cm.bell-labs.com/cm/ms/what/shannonday/shannon1948.pdf

  • Shih L (1999) Families of splitting criteria for decision trees. Stat Comput 9: 309–315

    Article  Google Scholar 

  • Siddique NH, Amavasai BP, Ikuta A (eds) (2007) Special issue on hybrid techniques in AI. Artificial Intelligence Revue 27

  • Sommerville I (2004) Software engineering. Pearson, Boston

    Google Scholar 

  • Sonnenburg S, Braun ML, Ong CS, Bengio S, Bottou L, Holmes G, Le Cun Y, Mueller KR, Pereira F, Rasmussen CE, Raetsch G, Schoelkopf B, Smola A (2007) The need for open source software in machine learning. J Mach Learn Res 8: 2443–2466

    Google Scholar 

  • Tracz W (1990) Where does reuse start?. ACM SIGSOFT Softw Eng Notes 15: 42–46

    Article  Google Scholar 

  • Utgoff PE et al (1997) Decision tree induction based on efficient tree restructuring. Mach Learn 29: 5–44

    Article  MATH  Google Scholar 

  • Winn T, Calder P (2002) Is this a pattern?. IEEE Softw 19: 59–66

    Article  Google Scholar 

  • Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. 2. Morgan Kaufmann, San Francisco

    MATH  Google Scholar 

  • Wu S, Flach PA (2002) Feature selection with labeled and unlabeled data. In: Proceedings of ECML/PKDD workshop on integration and collaboration aspects of data mining, decision support and meta-learning, IDDM 2002, Helsinki

  • Zeileis A, Hothorn T, Hornik K (2008) Model-based recursive partitioning. J Comput Gr Stat 17(2): 492–514

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Boris Delibasic.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Suknovic, M., Delibasic, B., Jovanovic, M. et al. Reusable components in decision tree induction algorithms. Comput Stat 27, 127–148 (2012). https://doi.org/10.1007/s00180-011-0242-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-011-0242-8

Keywords

Navigation