Skip to main content
Log in

Pessimistic cost-sensitive active learning of decision trees for profit maximizing targeting campaigns

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

In business applications such as direct marketing, decision-makers are required to choose the action which best maximizes a utility function. Cost-sensitive learning methods can help them achieve this goal. In this paper, we introduce Pessimistic Active Learning (PAL). PAL employs a novel pessimistic measure, which relies on confidence intervals and is used to balance the exploration/exploitation trade-off. In order to acquire an initial sample of labeled data, PAL applies orthogonal arrays of fractional factorial design. PAL was tested on ten datasets using a decision tree inducer. A comparison of these results to those of other methods indicates PAL’s superiority.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Blake CL, Merz CJ (1998), UCI repository of machine learning databases. University of California, Department of Information and Computer Science, Irvine, CA. http://www.ics.uci.edu/~mlearn/MLRepository.html

  • Buchner A, Mulvenna M (1998) Discovering internet marketing intelligence through online analytical web usage mining. SIGMOD Record 27(4): 54–61

    Article  Google Scholar 

  • Cestnik B (1990) Estimating probabilities: a crucial task in machine learning. In: Proceedings of the ninth European conference on artificial intelligence, pp 147–149

  • Cohn D, Atlas L, Ladner R (1994) Improving generalization with active learning. Mach Learn 15(2): 201–221

    Google Scholar 

  • Cohn DA, Ghahramani Z, Jordan MI (1996) Active learning with statistical models. J Artif Intell Res 4: 129–145

    MATH  Google Scholar 

  • Clarke P (2006) Christmas gift giving involvement. J Consumer Market 23(5): 283–291

    Article  Google Scholar 

  • Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7: 1–30

    MathSciNet  Google Scholar 

  • Domingos P (2005) Mining social networks for viral marketing. IEEE Intell Syst 20(1): 80–82

    Article  MathSciNet  Google Scholar 

  • Fong PWL (1995) A quantitative study of hypothesis selection. In: Twelfth international conference on machine learning, Tahoe City, California, pp 226–234

  • Hedayat AS, Sloane NJA, Stufken J (1999) Orthogonal arrays: theory and applications. Springer-Verlag, NY

    MATH  Google Scholar 

  • Hwang J, Ding A (1997) Prediction intervals for artificial neural networks. J Am Stat Assoc 92: 748–757

    Article  MATH  MathSciNet  Google Scholar 

  • Kaelbling LP (1993) Learning in embedded systems. MIT Press

  • Kaelbling LP, Littman M (1996) Reinforcement learning: a survey. J Artif Intell Res 4: 237–285

    Google Scholar 

  • Kirkpatrick S, Gelatt CD, Vecchi MP (1983) Optimization by simulated annealing. Science 220(4598): 671–680

    Article  MathSciNet  Google Scholar 

  • Kyriakopoulos K, Moorman C (2004) Tradeoffs in marketing exploitation and exploration strategies: the overlooked role of market orientatio. Int J Res Market 21: 219–240

    Article  Google Scholar 

  • Leemis LM, Trivedi KS (1996) A comparison of approximate interval estimators for the Bernoulli parameter. Am Stat 50: 63–68

    Article  MathSciNet  Google Scholar 

  • Levin N, Zahavi J (2005) Data mining for target marketing. In: Maimon O, Rokach L (eds) The data mining and knowledge discovery handbook, pp 1261–1301

  • Lewis D, Gale W (1994) A sequential algorithm for training text classifiers. In: Proceedings of the international ACM-SIGIR conference on research and development in information retrieval, pp 3–12

  • Ling CX, Li C (1998) Data mining for direct marketing: problems and solutions. In: Proceedings 4th international conference on knowledge discovery in databases (KDD-98), New York, pp 73–79

  • Margineantu D (2005) Active cost-sensitive learning. In: Proceedings of the nineteenth international joint conference on artificial intelligence, IJCAI-05

  • Mayer UF, Sarkissian A (2003) Experimental design for solicitation campaigns. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining. ACM Press, New York, NY, USA, pp 717–722

  • Montgomery DC (1997) Design and analysis of experiments, 4th edn. Wiley, New York

    MATH  Google Scholar 

  • Pednault E, Abe N, Zadrozny B (2002) Sequential cost-sensitive decision making with reinforcement learning. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp 259–268

  • Percus OE, Percus JK (1984) Modified Bayes technique in sequential clinical trials. Comput Biol Med 14: 127–134

    Article  Google Scholar 

  • Petkau AJ (1978) Sequential medical trials for comparing an experimental with a standard treatment. J Am Stat Assoc 73: 328–338

    Article  MathSciNet  Google Scholar 

  • Potharst R, Kaymak U, Pijls W (2002) Neural networks for target selection in direct marketing. In: Smith KA, Gupta JND (eds) Neural networks in business: techniques and applications, Idea Group Publishing, pp 89–110

  • Putten P, Someren M (eds) (2000) CoIL Challenge 2000: The Insurance Case. Published by Sentient Machine Research, Amsterdam. Also a Leiden Institute of Advanced Computer Science Technical Report 2000–09. June 22

  • Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann

  • Robbins H (1952) Some aspects of the sequential design of experiments. Bull Am Math Soc 55: 527–535

    Article  MathSciNet  Google Scholar 

  • Rokach L, Maimon O (2005) Top down induction of decision trees classifiers: a survey. IEEE SMC Trans Part C 35(4): 476–487

    Google Scholar 

  • Rothaermel FT, Deeds DL (2004) Exploration and exploitation alliances in biotechnology: a system of new product development. Strateg Manage J 25(3): 201–217

    Article  Google Scholar 

  • Roy N, McCallum A (2001) Toward optimal active learning through sampling estimation of error reduction. In: Proceedings of the international conference on machine learning

  • Saar-Tsechansky M, Provost F (2004) Active sampling for class probability estimation and ranking. Mach Learn 54(2): 153–178

    Article  MATH  Google Scholar 

  • Saar-Tsechansky M, Provost F (2007) Decision-centric active learning of binary-outcome models. Inform Syst Res 18(1): 4–22

    Article  Google Scholar 

  • Schein AI, Ungar LH (2007) Active learning for logistic regression: an evaluation. Mach Learn 68(3): 235–265

    Article  Google Scholar 

  • Sloane NJA (2007) A library of orthogonal arrays. http://www.research.att.com/~njas/oadir/index.html

  • Sofroniou N, Hutcheson GD (2002) Confidence intervals for the predictions of logistic regression in the presence and absence of a variance-covariance matrix. Understand Stat 1(1): 3–18

    Article  Google Scholar 

  • Strehl AL, Littman ML (2005) A theoretical analysis of model-based interval estimation. In: Proceedings of the 22nd international conference on Machine learning, August 07–11, 2005, Bonn, Germany, pp 856–863

  • Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press. http://www.cs.ualberta.ca/~sutton/book/ebook/the-book.html

  • Turney P (2000) Types of cost in inductive concept learning. Workshop on Cost Sensitive Learning at the 17th ICML, Stanford, CA

  • Utgoff PE (1989) Incremental induction of decision trees. Mach Learn 4(2): 161–186

    Article  Google Scholar 

  • Viaene S, Baesens B, De Moor B, VanDen Poel D, Dedene G, Suykens JAK, Vanthienen J, Viaene S, Van Gestel T (2001) Knowledge discovery using least squares support vector machine classifiers: a direct marketing case. Int J Intell Syst 16(9): 1023–1036

    Article  MATH  Google Scholar 

  • Weiss G, Tian T (2008) Maximizing classifier utility when there are data acquisition and modeling costs. Data Min Knowl Disc

  • Wiering M, Schmidhuber J (1998) Efficient model-based exploration. In: Proceedings of the 5th international conference on simulation of adaptive behavior, pp 223–228

  • Yinghui Y (2004) New data mining and marketing approaches for customer segmentation and promotion planning on the Internet, Phd Dissertation, University of Pennsylvania, ISBN 0-496-73213-1

  • Zahavi J, Levin N (1995) Issues and problems in applying neural computing to target marketing. (with N. Levin). J Direct Market 9(1): 7–12

    Google Scholar 

  • Zahavi J, Levin N (1997) Applying neural computing to target marketing. J Direct Market 11(1): 5–22

    Article  Google Scholar 

  • Zadrozny B (2005) One-benefit learning: cost-sensitive learning with restricted cost information. In: Proceedings of the workshop on utility-based data mining at the eleventh ACM SIGKDD international conference on knowledge discovery and data mining

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lior Rokach.

Additional information

Responsible editor: Maytal Saar-Tsechansky.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rokach, L., Naamani, L. & Shmilovici, A. Pessimistic cost-sensitive active learning of decision trees for profit maximizing targeting campaigns. Data Min Knowl Disc 17, 283–316 (2008). https://doi.org/10.1007/s10618-008-0105-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-008-0105-2

Keywords

Navigation