Abstract
A new approach to data clustering is proposed, in which two or more measures of cluster quality are simultaneously optimized using a multiobjective evolutionary algorithm (EA). For this purpose, the PESA-II EA is adapted for the clustering problem by the incorporation of specialized mutation and initialization procedures, described herein. Two conceptually orthogonal measures of cluster quality are selected for optimization, enabling, for the first time, a clustering algorithm to explore and improve different compromise solutions during the clustering process. Our results, on a diverse suite of 15 real and synthetic data sets – where the correct classes are known – demonstrate a clear advantage to the multiobjective approach: solutions in the discovered Pareto set are objectively better than those obtained when the same EA is applied to optimize just one measure. Moreover, the multiobjective EA exhibits a far more robust level of performance than both the classic k-means and average-link agglomerative clustering algorithms, outperforming them substantially on aggregate.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Supporting material, http://dbk.ch.umist.ac.uk/handl/vienna/
Blake, C., Merz, C.: UCI repository of machine learning databases. Technical report, Department of Information and Computer Sciences, University of California, Irvine (1998), http://, http://www.ics.uci.edu/~mlearn/MLRepository.html
Brucker, P.: On the complexity of clustering problems. In: Optimization and Operations Research, pp. 45–54. Springer, New York (1977)
Cole, R.M.: Clustering with genetic algorithms. Master’s thesis, University of Western Australia, Nedlands 6907, Australia (1998)
Corne, D.W., Jerram, N.R., Knowles, J.D., Oates, M.J.: PESA-II: regionbased selection in evolutionary multiobjective optimization. In: Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 2001), pp. 283–290. Morgan Kaufmann Publishers, San Francisco (2001)
Corne, D.W., Knowles, J.D., Oates, M.J.: The Pareto envelope-based selection algorithm for multiobjective optimization. In: Proceedings of the Parallel Problem Solving from Nature VI Conference, pp. 839–848. Springer, Heidelberg (2000)
Deb, K.: Multi-objective optimization using evolutionary algorithms. John Wiley & Sons, Chichester (2001)
Demiriz, A., Bennett, K., Embrechts, M.: Semi-supervised clustering using genetic algorithms. Technical report, Rensselaer Polytechnic Institute, Troy, New York (1999)
Falkenauer, E.: Genetic Algorithms and Grouping Problems. John Wiley & Sons, Chichester (1998)
Gablentz, W., Köppen, M., Dimitriadou, E.: Robust clustering by evolutionary computation. In: 5th Online World Conference on Soft Computing in Industrial Applications (WSC5), The Internet (2000)
Halkidi, M., Vazirgiannis, M., Batistakis, I.: Quality scheme assessment in the clustering process. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 265–267. Springer, Heidelberg (2000)
Jain, A.K., Murty, M.N., Flynn, P.: Data clustering: a review. ACM Computing Surveys 31(3), 264–323 (1999)
Law, M., Topchy, A., Jain, A.K.: Multiobjective data clustering. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (June 2004) (to appear)
Lozano, J.A., Larrañaga, P.: Applying genetic algorithms to search for the best hierarchical clustering of a dataset. Pattern Recognition Letters 20, 911–918 (1999)
MacQueen, L.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press, Berkeley (1967)
Maulik, U., Bandyopadhyay, S.: Genetic algorithm-based clustering technique. Pattern Recognition 33, 1455–1465 (2000)
Radcliffe, N.J.: Equivalence class analysis of genetic algorithms. Complex Systems 5, 183–205 (1991)
Topchy, A., Jain, A.K., Punch, W.: A mixture model for clustering ensembles. In: Proceedings SIAM Conf. on Data Mining (2004) (in press)
van Rijsbergen, C.: Information Retrieval, 2nd edn. Butterworths, London (1979)
Vorhees, E.: The effectiveness and efficiency of agglomerative hierarchical clustering in document retrieval. PhD thesis, Department of Computer Science, Cornell University (1985)
Weisstein, E.W.: Box-and-whisker plot. From MathWorld – A Wolfram Web Resource, http://mathworld.wolfram.com/Box-and-WhiskerPlot.html
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Handl, J., Knowles, J. (2004). Evolutionary Multiobjective Clustering. In: Yao, X., et al. Parallel Problem Solving from Nature - PPSN VIII. PPSN 2004. Lecture Notes in Computer Science, vol 3242. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30217-9_109
Download citation
DOI: https://doi.org/10.1007/978-3-540-30217-9_109
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23092-2
Online ISBN: 978-3-540-30217-9
eBook Packages: Springer Book Archive