Abstract
The focus of this chapter is Chemometrics and Predictive Modelling. Chemometrics is a multivariate statistical methodology that has a parallel and independent path of development that grew out of the need to statistically analyse chemical measurements with moderate to large numbers of variables, especially in cases when there are more variables than samples (or objects). In recent years, more and more of the Chemometrics methods have been fused into the mainstream of multivariate and high dimensional statistics. In this chapter, we explore the methodological foundations of Chemometrics and supplement it with a motivating example for the reader to appreciate the methodology. We also provide a comprehensive reference list for the readers who may want to read more about Chemometric methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Baldovin A, Wu W, Centner V, Jouan-Rimbaud D, Massart DL, Favretto L, Turello A (1996) Feature selection for the discrimination between pollution types with partial least squares modelling. Analyst 121:1603–1608
Baldovin A, Wu W, Massart DL, Turello A (1997) Regularized discriminant analysis (RDA)-modelling for the binary discrimination between pollution types. Chemometr Intell Lab Syst 38:25–37
Brereton RG (1990) Chemometrics, applications of mathematics and statistics to laboratory systems. Ellis Horwood Limited, Chichester
Brown SD, Blank TB, Sum ST, Weyer LG (1994) Chemometrics. Anal Chem 66:315R–359R
Candolfi A, Wu W, Centner V, Massart DL (1998) Comparison of classification approaches applied to NIR-spectra of clinical study lots. J Pharm Biomed Anal 16:1329–1347
Connor SC, Wu W, Sweatman BC, Manini J, Haselden JN, Crowther DJ, Waterfield CJ (2004) The effects of feeding and body weight loss on the 1H NMR-based urine metabolic profiles of male Wistar Han rats: implications for biomarker discovery. Biomarkers 9:156–179
Cordingley HC, Rpberts SLL, Tooke P, Armitage JR, Lane PW, Wu W, Wildsmith WE (2003) Multifactorial screening design and analysis of SELDI-TOF ProteinChip array optimisation experiments. Biotechniques 34:364–373
Cutler P, Akuffo EL, Bodnar WM, Briggs DM, Davis JB, Debouck CM, Fox SM, Gibson RA, Gormley DA, Holbrook JD, Jacqueline Hunter A, Kinsey EE, Prinjha R, Richardson JC, Roses AD, Smith MA, Tsokanas N, Willé DR, Wu W, Yates JW, Gloger IS (2008) Proteomic identification and early validation of complement 1 inhibitor and pigment epithelium-derived factor: two novel biomarkers of Alzheimer’s disease in human plasma. Proteomics Clin Appl 2:467–477
Czekaj T, Wu W, Walczak B (2005) About kernel latent variable approaches and SVM. J Chemometr 19:341–354
Czekaj T, Wu W, Walczak B (2008) Classification of genomic data: some aspects of feature selection. Talanta 76:564–574
Daszykowski M, Wu W, Nicholls AW, Ball RJ, Walczak B (2007) Identifying potential biomarkers in LC-MS data. J Chemometr 21:292–302
Edgington ES (1995) Randomization tests, 3rd edn. Wiley, New York
Esbensen K, Geladi P (2005) The start and early history of chemometrics: selected interviews. Part 2. J Chemometr 4:389–412
Edgington ES (1964) Randomization tests. J Psychol 57(2):445–449
Geladi P, Esbensen K (2005) The start and early history of chemometrics: selected interviews. Part 1. J Chemometr 4:337–354
Guo Q, Wu W, Massart DL (1999) The robust normal variate transform for pattern recognition with near-infrared data. Anal Chim Acta 382:87–103
Guo Q, Wu W, Questier F, Massart DL, Boucon C, de Jong S (2000) Sequential projection pursuit using genetic algorithms for data mining. Anal Chem 72:2846–2855
Guo Q, Wu W, Massart DL, Boucon C, de Jong S (2001) Feature selection in sequential projection pursuit. Anal Chem Acta 446:85–96
Guo Q, Wu W, Massart DL, Boucon C, de Jong S (2002) Feature selection in principal component analysis of analytical data. Chemometr Intell Lab Syst 61:123–132
Kalivas JH, Roberts N, Sutter JM (1989) Global optimization by simulated annealing with wavelength selection for ultraviolet-visible spectrophotometry. Anal Chem 61:2024–2030
Kvalheim OM (1996) Chemometrics, quality, information and the third waves. Chemometr Intell Lab Syst 33:1–2
Leardi R (2000) Application of genetic algorithm-PLS for feature selection in spectral data sets. J Chemometr 14:643–655
Leardi R, Boggia R, Terrile M (1992) Genetic algorithms as a strategy for feature selection. J Chemometr 6:267–281
Massart DL, Vandeginste BGM, Deming SN, Michotte Y, Kaufman L (1988) Chemometrics: a textbook. Elsevier Science Publishers B. V, Amsterdam
Massart DL, Vandeginste BGM, Buydens LMC, De Jong S, Lewi PJ, Smeyers-Verbeke J (1997) Handbook of chemometrics and qualimetrics. Part A. In: Data Handling in Science and Technology, vol 20A. Elsevier, Amsterdam
Massart DL, Vandeginste BGM, Buydens LMC, De Jong S, Lewi PJ, Smeyers-Verbeke J (1998) Handbook of chemometrics and qualimetrics. Part B. In: Data Handling in Science and Technology, vol 20A. Elsevier, Amsterdam
MATLAB 6.1, The MathWorks Inc., Natick, MA, 2000
McInnes IB, Lee JS, Wu W, Giles JT, Bathon J, Salmon J, Beaulieu A, Codding C, Delles C, Sattar N (2010) Lipid and inflammation parameters: a translational, randomized placebo-controlled study to evaluate effects of tocilizumab: the MEASURE study. Oral presentation, 2010 ACR/ARHP annual scientific meeting, Atlanta, GA, 6–11 November 2010
McInnes IB, Lee JS, Wu W, Giles JT, Bathon JM, Salmon JE, Beaulieu AD, Codding CE, Delles C, Sattar N (2011) MEASURE: A translational, randomized, placebo (PBO)-controlled study to evaluate the effects of tocilizumab (TCZ) on parameters of lipids and inflammation. Oral presentation, EULAR 2011, European League Against Rheumatism, London, 25–28 May 2011
Menaa F (2014) Next-generation sequencing or the dilemma of large-scale data analysis: opportunities, insights, and challenges to translational, preventive and personalized medicine. J Investig Genomics 1(1):00005
Niazi A, Leardi R (2012) Genetic algorithms in chemometrics. J Chemometr 26:345–351
Rathore AS, Bhushan N, Hadpe S (2011) Chemometrics applications in biotech processes: a review. Biotechnol Prog 27:307–315
Rathore AS, Mittal S, Pathak M, Mahalingam V (2014) Chemometrics application in biotech processes: assessing comparability across processes and scales. J Chem Technol Biotechnol 89:7
Walczak B, Wu W (2005) Fuzzy warping of chromatograms. Chemometr Intell Lab Syst 77: 173–180
Wegman EJ (1990) Hyperdimensional data analysis using parallel coordinates. J Am Stat Assoc 85:664–675
Wold S (1995) Chemometrics; what do we mean with it, and what do we want from it? Chemometr Intell Lab Syst 30:109–115
Wu W, Manne R (2000) Fast regression methods in a Lanczos (or PLS-1) basis. Theory and applications. Chemometr Intell Lab Syst 51:145–161
Wu W, Massart DL (1996) Artificial neural networks in classification of NIR spectral data: selection of the input. Chemometr Intell Lab Syst 35:127–135
Wu W, Massart DL (1997) Regularised nearest neighbour classification method in pattern recognition of near infrared spectra. Anal Chim Acta 349:253–261
Wu W, Walczak B, Massart DL, Prebble KA, Last IR (1995) Spectral transformation and wavelength selection in NIR spectra classification. Anal Chim Acta 315:243–255
Wu W, Rutan SC, Baldovin A, Massart DL (1996a) Feature selection using the Kalman filter for classification of multivariate data. Anal Chim Acta 335:11–22
Wu W, Walczak B, Penninckx W, Massart DL (1996b) Feature reduction by Fourier transform in pattern recognition of NIR data. Anal Chim Acta 331:75–83
Wu W, Mallet Y, Walczak B, Penninckx W, Massart DL, Heuerding S, Erni F (1996c) Comparison of regularized discriminant analysis, linear discriminant analysis and quadratic discriminant analysis, applied to NIR data. Anal Chim Acta 329:257–265
Wu W, Walczak B, Massart DL, Heuerding S, Erni F, Last IR, Prebble KA (1996d) Artificial neural networks in classification of NIR spectral data: design of the training set. Chemometr Intell Lab Syst 33:35–46
Wu W, Massart DL, de Jong S (1997a) The kernel PCA algorithms for wide data, Part I: theory and algorithms. Chemometr Intell Lab Syst 36:165–172
Wu W, Massart DL, de Jong S (1997b) Kernel PCA algorithms for wide data, Part II: Fast cross-validation and application in classification of NIR data. Chemometr Intell Lab Syst 37:271–280
Wu W, Guo Q, de Aguiar PF, Massart DL (1998) The star plot: an alternative display method for multivariate data in the analysis of food and drugs. J Pharm Biomed Anal 17:1001–1013
Wu W, Guo Q, Jouan-Rimbaud D, Massart DL (1999) Using contrasts as data pretreatment method in pattern recognition of multivariate data. Chemometr Intell Lab Syst 45:39–53
Wu W, Wildsmith SE, Winkley AJ, Yallop RM, Elcock F, Bugelski PJ (2001) Chemometric strategies for normalisation of gene expression data obtained from cDNA microarrays. Anal Chem Acta 446:451–466
Wu W, Guo Q, de Jong S, Massart DL (2002) Randomisation test for the number of dimensions of the group average space in generalised Procrustes analysis. Food Qual Prefer 13:191–200
Wu W, Roberts SLL, Cordingley HC, Armitage JR, Tooke P, Wildsmith SE (2003a) Validation of consensus between proteomic and clinical chemical data by applying a new randomisation F-test for generalised Procrustes analysis. Anal Chim Acta 490:365–378
Wu W, Guo Q, Massart DL, Boucon C, de Jong S (2003b) Structure preserving feature selection in PARAFAC using a genetic algorithm and Procrustes analysis. Chemometr Intell Lab Syst 65:83–95
Wu W, Shaw P, Ruan J, Elcock FJ, Wildsmith SE (2005) Optimisation of image analysis process for cDNA microarrays by experimental designs. Chemometr Intell Lab Syst 76:175–184
Young FW, Valero-Mora PM, Friendly M (2006) Visual statistics – seeing data with dynamic interactive graphics. Wiley, Hoboken
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Wu, W., Herath, A. (2016). Chemometrics and Predictive Modelling. In: Zhang, L. (eds) Nonclinical Statistics for Pharmaceutical and Biotechnology Industries. Statistics for Biology and Health. Springer, Cham. https://doi.org/10.1007/978-3-319-23558-5_25
Download citation
DOI: https://doi.org/10.1007/978-3-319-23558-5_25
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23557-8
Online ISBN: 978-3-319-23558-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)