Skip to main content

Part of the book series: Statistics for Biology and Health ((SBH))

  • 3063 Accesses

Abstract

The focus of this chapter is Chemometrics and Predictive Modelling. Chemometrics is a multivariate statistical methodology that has a parallel and independent path of development that grew out of the need to statistically analyse chemical measurements with moderate to large numbers of variables, especially in cases when there are more variables than samples (or objects). In recent years, more and more of the Chemometrics methods have been fused into the mainstream of multivariate and high dimensional statistics. In this chapter, we explore the methodological foundations of Chemometrics and supplement it with a motivating example for the reader to appreciate the methodology. We also provide a comprehensive reference list for the readers who may want to read more about Chemometric methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Baldovin A, Wu W, Centner V, Jouan-Rimbaud D, Massart DL, Favretto L, Turello A (1996) Feature selection for the discrimination between pollution types with partial least squares modelling. Analyst 121:1603–1608

    Article  Google Scholar 

  • Baldovin A, Wu W, Massart DL, Turello A (1997) Regularized discriminant analysis (RDA)-modelling for the binary discrimination between pollution types. Chemometr Intell Lab Syst 38:25–37

    Article  Google Scholar 

  • Brereton RG (1990) Chemometrics, applications of mathematics and statistics to laboratory systems. Ellis Horwood Limited, Chichester

    Google Scholar 

  • Brown SD, Blank TB, Sum ST, Weyer LG (1994) Chemometrics. Anal Chem 66:315R–359R

    Article  Google Scholar 

  • Candolfi A, Wu W, Centner V, Massart DL (1998) Comparison of classification approaches applied to NIR-spectra of clinical study lots. J Pharm Biomed Anal 16:1329–1347

    Article  Google Scholar 

  • Connor SC, Wu W, Sweatman BC, Manini J, Haselden JN, Crowther DJ, Waterfield CJ (2004) The effects of feeding and body weight loss on the 1H NMR-based urine metabolic profiles of male Wistar Han rats: implications for biomarker discovery. Biomarkers 9:156–179

    Article  Google Scholar 

  • Cordingley HC, Rpberts SLL, Tooke P, Armitage JR, Lane PW, Wu W, Wildsmith WE (2003) Multifactorial screening design and analysis of SELDI-TOF ProteinChip array optimisation experiments. Biotechniques 34:364–373

    Google Scholar 

  • Cutler P, Akuffo EL, Bodnar WM, Briggs DM, Davis JB, Debouck CM, Fox SM, Gibson RA, Gormley DA, Holbrook JD, Jacqueline Hunter A, Kinsey EE, Prinjha R, Richardson JC, Roses AD, Smith MA, Tsokanas N, Willé DR, Wu W, Yates JW, Gloger IS (2008) Proteomic identification and early validation of complement 1 inhibitor and pigment epithelium-derived factor: two novel biomarkers of Alzheimer’s disease in human plasma. Proteomics Clin Appl 2:467–477

    Article  Google Scholar 

  • Czekaj T, Wu W, Walczak B (2005) About kernel latent variable approaches and SVM. J Chemometr 19:341–354

    Article  Google Scholar 

  • Czekaj T, Wu W, Walczak B (2008) Classification of genomic data: some aspects of feature selection. Talanta 76:564–574

    Article  Google Scholar 

  • Daszykowski M, Wu W, Nicholls AW, Ball RJ, Walczak B (2007) Identifying potential biomarkers in LC-MS data. J Chemometr 21:292–302

    Article  Google Scholar 

  • Edgington ES (1995) Randomization tests, 3rd edn. Wiley, New York

    MATH  Google Scholar 

  • Esbensen K, Geladi P (2005) The start and early history of chemometrics: selected interviews. Part 2. J Chemometr 4:389–412

    Article  Google Scholar 

  • Edgington ES (1964) Randomization tests. J Psychol 57(2):445–449

    Article  Google Scholar 

  • Geladi P, Esbensen K (2005) The start and early history of chemometrics: selected interviews. Part 1. J Chemometr 4:337–354

    Article  Google Scholar 

  • Guo Q, Wu W, Massart DL (1999) The robust normal variate transform for pattern recognition with near-infrared data. Anal Chim Acta 382:87–103

    Article  Google Scholar 

  • Guo Q, Wu W, Questier F, Massart DL, Boucon C, de Jong S (2000) Sequential projection pursuit using genetic algorithms for data mining. Anal Chem 72:2846–2855

    Article  Google Scholar 

  • Guo Q, Wu W, Massart DL, Boucon C, de Jong S (2001) Feature selection in sequential projection pursuit. Anal Chem Acta 446:85–96

    Article  Google Scholar 

  • Guo Q, Wu W, Massart DL, Boucon C, de Jong S (2002) Feature selection in principal component analysis of analytical data. Chemometr Intell Lab Syst 61:123–132

    Article  Google Scholar 

  • Kalivas JH, Roberts N, Sutter JM (1989) Global optimization by simulated annealing with wavelength selection for ultraviolet-visible spectrophotometry. Anal Chem 61:2024–2030

    Article  Google Scholar 

  • Kvalheim OM (1996) Chemometrics, quality, information and the third waves. Chemometr Intell Lab Syst 33:1–2

    Article  Google Scholar 

  • Leardi R (2000) Application of genetic algorithm-PLS for feature selection in spectral data sets. J Chemometr 14:643–655

    Article  Google Scholar 

  • Leardi R, Boggia R, Terrile M (1992) Genetic algorithms as a strategy for feature selection. J Chemometr 6:267–281

    Article  Google Scholar 

  • Massart DL, Vandeginste BGM, Deming SN, Michotte Y, Kaufman L (1988) Chemometrics: a textbook. Elsevier Science Publishers B. V, Amsterdam

    MATH  Google Scholar 

  • Massart DL, Vandeginste BGM, Buydens LMC, De Jong S, Lewi PJ, Smeyers-Verbeke J (1997) Handbook of chemometrics and qualimetrics. Part A. In: Data Handling in Science and Technology, vol 20A. Elsevier, Amsterdam

    Google Scholar 

  • Massart DL, Vandeginste BGM, Buydens LMC, De Jong S, Lewi PJ, Smeyers-Verbeke J (1998) Handbook of chemometrics and qualimetrics. Part B. In: Data Handling in Science and Technology, vol 20A. Elsevier, Amsterdam

    Google Scholar 

  • MATLAB 6.1, The MathWorks Inc., Natick, MA, 2000

    Google Scholar 

  • McInnes IB, Lee JS, Wu W, Giles JT, Bathon J, Salmon J, Beaulieu A, Codding C, Delles C, Sattar N (2010) Lipid and inflammation parameters: a translational, randomized placebo-controlled study to evaluate effects of tocilizumab: the MEASURE study. Oral presentation, 2010 ACR/ARHP annual scientific meeting, Atlanta, GA, 6–11 November 2010

    Google Scholar 

  • McInnes IB, Lee JS, Wu W, Giles JT, Bathon JM, Salmon JE, Beaulieu AD, Codding CE, Delles C, Sattar N (2011) MEASURE: A translational, randomized, placebo (PBO)-controlled study to evaluate the effects of tocilizumab (TCZ) on parameters of lipids and inflammation. Oral presentation, EULAR 2011, European League Against Rheumatism, London, 25–28 May 2011

    Google Scholar 

  • Menaa F (2014) Next-generation sequencing or the dilemma of large-scale data analysis: opportunities, insights, and challenges to translational, preventive and personalized medicine. J Investig Genomics 1(1):00005

    Google Scholar 

  • Niazi A, Leardi R (2012) Genetic algorithms in chemometrics. J Chemometr 26:345–351

    Article  Google Scholar 

  • Rathore AS, Bhushan N, Hadpe S (2011) Chemometrics applications in biotech processes: a review. Biotechnol Prog 27:307–315

    Article  Google Scholar 

  • Rathore AS, Mittal S, Pathak M, Mahalingam V (2014) Chemometrics application in biotech processes: assessing comparability across processes and scales. J Chem Technol Biotechnol 89:7

    Article  Google Scholar 

  • Walczak B, Wu W (2005) Fuzzy warping of chromatograms. Chemometr Intell Lab Syst 77: 173–180

    Article  Google Scholar 

  • Wegman EJ (1990) Hyperdimensional data analysis using parallel coordinates. J Am Stat Assoc 85:664–675

    Article  Google Scholar 

  • Wold S (1995) Chemometrics; what do we mean with it, and what do we want from it? Chemometr Intell Lab Syst 30:109–115

    Article  Google Scholar 

  • Wu W, Manne R (2000) Fast regression methods in a Lanczos (or PLS-1) basis. Theory and applications. Chemometr Intell Lab Syst 51:145–161

    Article  Google Scholar 

  • Wu W, Massart DL (1996) Artificial neural networks in classification of NIR spectral data: selection of the input. Chemometr Intell Lab Syst 35:127–135

    Article  Google Scholar 

  • Wu W, Massart DL (1997) Regularised nearest neighbour classification method in pattern recognition of near infrared spectra. Anal Chim Acta 349:253–261

    Article  Google Scholar 

  • Wu W, Walczak B, Massart DL, Prebble KA, Last IR (1995) Spectral transformation and wavelength selection in NIR spectra classification. Anal Chim Acta 315:243–255

    Article  Google Scholar 

  • Wu W, Rutan SC, Baldovin A, Massart DL (1996a) Feature selection using the Kalman filter for classification of multivariate data. Anal Chim Acta 335:11–22

    Article  Google Scholar 

  • Wu W, Walczak B, Penninckx W, Massart DL (1996b) Feature reduction by Fourier transform in pattern recognition of NIR data. Anal Chim Acta 331:75–83

    Article  Google Scholar 

  • Wu W, Mallet Y, Walczak B, Penninckx W, Massart DL, Heuerding S, Erni F (1996c) Comparison of regularized discriminant analysis, linear discriminant analysis and quadratic discriminant analysis, applied to NIR data. Anal Chim Acta 329:257–265

    Article  Google Scholar 

  • Wu W, Walczak B, Massart DL, Heuerding S, Erni F, Last IR, Prebble KA (1996d) Artificial neural networks in classification of NIR spectral data: design of the training set. Chemometr Intell Lab Syst 33:35–46

    Article  Google Scholar 

  • Wu W, Massart DL, de Jong S (1997a) The kernel PCA algorithms for wide data, Part I: theory and algorithms. Chemometr Intell Lab Syst 36:165–172

    Article  Google Scholar 

  • Wu W, Massart DL, de Jong S (1997b) Kernel PCA algorithms for wide data, Part II: Fast cross-validation and application in classification of NIR data. Chemometr Intell Lab Syst 37:271–280

    Article  Google Scholar 

  • Wu W, Guo Q, de Aguiar PF, Massart DL (1998) The star plot: an alternative display method for multivariate data in the analysis of food and drugs. J Pharm Biomed Anal 17:1001–1013

    Article  Google Scholar 

  • Wu W, Guo Q, Jouan-Rimbaud D, Massart DL (1999) Using contrasts as data pretreatment method in pattern recognition of multivariate data. Chemometr Intell Lab Syst 45:39–53

    Article  Google Scholar 

  • Wu W, Wildsmith SE, Winkley AJ, Yallop RM, Elcock F, Bugelski PJ (2001) Chemometric strategies for normalisation of gene expression data obtained from cDNA microarrays. Anal Chem Acta 446:451–466

    Article  Google Scholar 

  • Wu W, Guo Q, de Jong S, Massart DL (2002) Randomisation test for the number of dimensions of the group average space in generalised Procrustes analysis. Food Qual Prefer 13:191–200

    Article  Google Scholar 

  • Wu W, Roberts SLL, Cordingley HC, Armitage JR, Tooke P, Wildsmith SE (2003a) Validation of consensus between proteomic and clinical chemical data by applying a new randomisation F-test for generalised Procrustes analysis. Anal Chim Acta 490:365–378

    Article  Google Scholar 

  • Wu W, Guo Q, Massart DL, Boucon C, de Jong S (2003b) Structure preserving feature selection in PARAFAC using a genetic algorithm and Procrustes analysis. Chemometr Intell Lab Syst 65:83–95

    Article  Google Scholar 

  • Wu W, Shaw P, Ruan J, Elcock FJ, Wildsmith SE (2005) Optimisation of image analysis process for cDNA microarrays by experimental designs. Chemometr Intell Lab Syst 76:175–184

    Article  Google Scholar 

  • Young FW, Valero-Mora PM, Friendly M (2006) Visual statistics – seeing data with dynamic interactive graphics. Wiley, Hoboken

    Book  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wen Wu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Wu, W., Herath, A. (2016). Chemometrics and Predictive Modelling. In: Zhang, L. (eds) Nonclinical Statistics for Pharmaceutical and Biotechnology Industries. Statistics for Biology and Health. Springer, Cham. https://doi.org/10.1007/978-3-319-23558-5_25

Download citation

Publish with us

Policies and ethics