Abstract
Accurate prediction of the phenotypical performance of untested single-cross hybrids allows for a faster genetic progress of the breeding pool at a reduced cost. We propose a prediction method based on ɛ-insensitive support vector machine regression (ɛ-SVR). A brief overview of the theoretical background of this fairly new technique and the use of specific kernel functions based on commonly applied genetic similarity measures for dominant and co-dominant markers are presented. These different marker types can be integrated into a single regression model by means of simple kernel operations. Field trial data from the grain maize breeding programme of the private company RAGT R2n are used to assess the predictive capabilities of the proposed methodology. Prediction accuracies are compared to those of one of today’s best performing prediction methods based on best linear unbiased prediction. Results on our data indicate that both methods match each other’s prediction accuracies for several combinations of marker types and traits. The ɛ-SVR framework, however, allows for a greater flexibility in combining different kinds of predictor variables.
Similar content being viewed by others
References
Bahlmann C, Haasdonk B, Burkhardt H (2002) On-line handwriting recognition with support vector machines – a kernel approach. In: Proceedings of the 8th international workshop on frontiers in handwriting recognition. IEEE Computer Society, Washington, pp 49–54
Bernardo R (1993) Estimation of coefficient of coancestry using molecular markers in maize. Theor Appl Genet 85:1055–1062
Bernardo R (1994) Prediction of maize single-cross performance using RFLPs and information from related hybrids. Crop Sci 34:20–25
Bernardo R (1995) Genetic models for predicting maize single-cross performance in unbalanced yield trial data. Crop Sci 35:141–147
Bernardo R (1996a) Best linear unbiased prediction of the performance of crosses between untested maize inbreds. Crop Sci 36:50–56
Bernardo R (1996b) Best linear unbiased prediction of maize single-cross performance. Crop Sci 36:872–876
Bernardo R, Murigneux A, Karaman Z (1996) Marker-based estimates of identity by descent and alikeness in state among maize inbreds. Theor Appl Genet 93:262–267
Bernardo R, Romero-Severson J, Ziegle J, Hauser J, Joe L, Hookstra G, Doerge R (2000) Parental contribution and coefficient of coancestry among maize inbreds: pedigree, RFLP and SSR data. Theor Appl Genet 100:552–556
Burges C (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Disc 2:121–167
Castiglioni P, Ajmone-Marsan P, van Wijk R, Motto M (1999) AFLP markers in a molecular linkage map of maize: codominant scoring and linkage group distribution. Theor Appl Genet 99:425–431
Chang C, Lin C (2001) LIBSVM: A library for support vector machines. http://www.csie.ntu.edu.tw/∼cjlin/libsv cited 20 December 2006
Chapelle O, Vapnik V, Bousquet O, Mukherjee S (2002) Choosing multiple parameters for support vector machines. Mach Learn 46:131–159
Charcosset A, Bonnisseau B, Touchebeuf O, Burstin J, Dubreuil P, Barriére Y, Gallais A, Denis JB (1998) Prediction of maize hybrid silage performance using marker data: comparison of several models for specific combining ability. Crop Sci 38:38–44
Decoste D, Schölkopf B (2002) Training invariant support vector machines. Mach Learn 46:161–190
Emik L, Terrill C (1949) Systematic procedures for calculating inbreeding coefficients. J Hered 40:51–55
Fan RE, Chen PH, Lin CJ (2005) Working set selection using second order information for training SVM. J Mach Learn Res 6:1889–1918
Galassi M, Davies J, Theiler J, Gough B, Priedhorsky R, Jungman G, Booth M (1998) GNU scientific library reference manual, 2nd edn. Available via http://www.gnu.org/software/gsl cited 20 December 2006
Gilmour A, Gogel B, Cullis B, Welham S, Thompson R (2002) ASREML user guide release 1.0. VSN International Ltd.
Goodman M, Stuber C (1983) Races of maize: VI. Isozyme variation among races of maize in Bolivia. Maydica 28:169–187
Gower J, Legendre P (1986) Metric and euclidean properties of dissimilarity coefficients. J Class 3:5–48
Guyon I, Weston J, Barnhil S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46:389–422
Haasdonk B, Keysers D (2002) Tangent distance kernels for support vector machines. In: Proceedings of the 16th international conference on pattern recognition. IEEE Computer Society Press, Washington, pp 864–868
Hsu C, Chang C, Lin C (2003) A practical guide to support vector classification. Department of Computer Science and Information Engineering, National Taiwan University. Available via http://www.csie.ntu.edu.tw/∼cjlin/papers/guide/guide.pd cited 20 December 2006
Maenhout S, De Baets B, Haesaert G, Van Bockstaele E (2007) Marker-based screening of maize inbred lines using support vector machine regression. Euphytica doi: 10.1007/s10681-007-9423-5 (in press)
Melchinger A (1999) Genetic diversity and heterosis. In: Coors J, Pandey S (eds) The genetics and exploitation of heterosis in crops. American Society of Agronomy, Madison, pp 99–118
Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, Cambridge
Smola A, Schölkopf B (2004) A tutorial on support vector regression. Stat Comput 14:199–222
Stuber C, Cockerham C (1966) Gene effects and variances in hybrid populations. Genetics 54:1279–1286
Vapnik V (1995) The nature of statistical learning theory. Springer, New York
Vos P, Hogers R, Bleeker M, Reijans M, Van de Lee T, Hornes M, Frijters A, Pot J, Peleman J, Kuiper M, Zabeau M (1995) AFLP–a new technique for DNA-fingerprinting. Nucleic Acids Res 23:4407–4414
Vuylsteke M, Mank R, Antonise R, Bastiaans RE, Senior M, Stuber C, Melchinger A, Lübberstedt T, Xia X, Stam P, Zabeau M, Kuiper M (1999) Two high-density AFLP (R) linkage maps of Zea mays L.: analysis of distribution of AFLP markers. Theor Appl Genet 99:921–935
Weston J, Mukherjee S, Chapelle O, Pontil M, Poggio T, Vapnik V (2000) Feature selection for SVMs. In: Advances in neural information processing systems. vol 13. MIT, Cambridge, pp 668–674
Wright S (1978) Variability within and among natural populations. In: Evolution and the genetics of populations. vol. 4, University of Chicago Press, Chicago, pp 449–450
Zien A, Ratsch G, Mika S, Scholkopf B, Lengauer T, Muller KR (2000) Engineering support vector machine kernels that recognize translation initiation sites. Bioinformatics 16(9):799–807
Acknowledgments
The authors would like to thank the people from RAGT R2n for their unreserved and open minded scientific contribution to this research. We are also very grateful to Stijn Vansteelandt, Jan De Riek and Peter Dawyndt for discussions on linear mixed modelling, genotyping by means of AFLP markers and cluster computing.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by M. Cooper.
Rights and permissions
About this article
Cite this article
Maenhout, S., De Baets, B., Haesaert, G. et al. Support vector machine regression for the prediction of maize hybrid performance. Theor Appl Genet 115, 1003–1013 (2007). https://doi.org/10.1007/s00122-007-0627-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00122-007-0627-9