Use of advanced modelling methods to estimate radiata pine productivity indices

https://doi.org/10.1016/j.foreco.2020.118557Get rights and content

Highlights

  • Productivity indices have been used to describe age normalised height and volume.

  • We predict productivity indices for radiata pine using a range of modelling methods.

  • Regression kriging using Random Forest, XGBoost and regression were most precise.

  • An ensemble model using all three methods was more precise than individual models.

  • New Zealand productivity index maps for radiata pine were developed.

Abstract

Site productivity indices have been widely used to describe age normalised height and volume for a range of forest species. In this study we used a wide range of modelling methods to predict Site Index and 300 Index for Pinus radiata D. Don. Site Index normalises height to a standardised age, while the 300 Index normalises volume measurements to a standardised age, stand density and set of silvicultural conditions. These two indices were derived from a national database of 3,676 plots with predictors extracted from geospatial surfaces describing key landform, topographic, climatic, edaphic and species-specific features (e.g. disease severity). Using these data, our objectives were to (i) compare the accuracy of geospatial, parametric and non-parametric models in predicting Site Index and 300 Index, (ii) determine whether regression kriging could be used to improve the accuracy of these predictions, (iii) identify the most influential predictors of these two indices and (iv) produce maps of both indices across New Zealand. All predictions were made on a test dataset (n = 1,104) that was not used for model fitting.

The two non-parametric models eXtreme Gradient Boosting (XGBoost) and random forest provided the most precise predictions of Site Index and 300 Index and markedly outperformed both parametric and geospatial models (ordinary kriging, inverse distance weighting). Random forest provided the most precise predictions of Site Index (R2 = 0.811, RMSE = 2.027 m, RMSE% = 6.73%) while XGBoost most precisely predicted 300 Index (R2 = 0.676, RMSE = 3.462 m3 ha−1 yr−1, RMSE% = 12.63%).

The use of regression kriging improved the fit of all but one model through accounting for spatial co-variance in the model error. Gains in precision were most marked for the parametric models, and in particular the regression model. After kriging, the three most precise models for both indices were random forest, followed by XGBoost and the regression model. An ensemble model derived from the mean predictions of these three models provided the most precise predictions, among all tested models, for both Site Index (R2 = 0.818, RMSE = 1.991 m, RMSE% = 6.61%) and 300 Index (R2 = 0.691, RMSE = 3.384 m3 ha−1 yr−1, RMSE% = 12.35%). Fitting a range of models to productivity indices was found to be a useful approach as this allows creation of an ensemble model and provides greater insight into the key determinants of productivity.

Introduction

Pinus radiata D. Don (radiata pine) has been widely planted in the Southern Hemisphere and constitutes a large proportion of plantations in New Zealand, Chile and Australia (Lewis and Ferguson, 1993). This species is very responsive to environment and as a consequence productivity has been found to range widely across the environments over which it is grown (Palmer et al., 2009b, Watt et al., 2010). Previous studies have shown that much of this variation in productivity is attributable to air temperature, water availability, soil nutrition and windspeed (Watt et al., 2010, Kirschbaum and Watt, 2011).

Stand productivity is predicted by empirical models as a function of stand age, using non-linear functional forms. Variation in productivity between stands is accounted for by standardised measurements of productivity at a given age that are used to adjust both the trajectory and the asymptote of predictions of productivity over time. Site Index, which expresses the height of dominant or co-dominant trees at a given age (Skovsgaard and Vanclay, 2008), has been most widely used to account for this inter-stand variation as this metric is correlated with productivity (Eichhorn, 1904, Bontemps and Bouriaud, 2014) and the height of dominant trees is relatively invariant to stand density (Pienaar and Shiver, 1984, Lanner, 1985, Maclaren et al., 1995). Although Site Index is very useful for describing standardised variation in height it does have limitations as a productivity metric because stand height does not account for variation in basal area (Hasenauer et al., 1994, Vanclay et al., 1995, Skovsgaard and Vanclay, 2008). As a consequence, indices that normalise stand volume as a function of age, stand density and other important silvicultural variables have been developed. This productivity metric for P. radiata, which essentially describes a normalised mean annual increment at age 30, is termed the 300 Index (Kimberley et al., 2005).

Environmental surfaces have been widely used through a range of modelling approaches to develop maps of Site Index and 300 Index for P. radiata (Palmer et al., 2009b, Kimberley et al., 2017) and Site Index for many other coniferous tree species (Fontes et al., 2003, Wang et al., 2004, Seynave et al., 2005, Monserud et al., 2006, Watt et al., 2009, Palmer et al., 2012). Compared to direct measurements of these indices made using plot data, that are typically averaged to the stand level, predictions of indices from environmental surfaces open up a range of applications that are not available from traditional inventory. The resulting spatial descriptions enhance understanding of the key environmental drivers of productivity and where optimal productivity is likely to occur in both existing forests and unplanted areas (Palmer et al., 2009b, Kimberley et al., 2017). Surfaces of Site Index and 300 Index can also be used as input to models used for key management decisions such as the optimisation of final crop stand density (Sopt) and development of surfaces showing spatial variation in Sopt (Watt et al., 2017).

A large number of modelling methods with varying levels of complexity have been used to predict Site Index for a wide range of forest species growing in Europe, North America and New Zealand. These methods range from relatively simple approaches such as multiple regression (Wang, 1995, Chen et al., 2002, Sánchez-Rodrıguez et al., 2002, Hamel et al., 2004, Nigh et al., 2004, Wang et al., 2004, Seynave et al., 2005, Monserud et al., 2006, Seynave et al., 2008, Socha, 2008, Pinno et al., 2009, Watt et al., 2009, Aertsen et al., 2010, Watt et al., 2010, Aertsen et al., 2011, Palmer et al., 2012, Sharma et al., 2012, Codilan et al., 2015) to more complex parametric methods such as partial least squares (PLS), lasso, elastic net, least angle regression and infinitesimal forward stagewise regression (González-Rodríguez and Diéguez-Aranda, 2020). A wide range of non-parametric methodologies have also been used to model Site Index including random forest (Weiskittel et al., 2011, Sabatia and Burkhart, 2014), boosted trees (Aertsen et al., 2010, Aertsen et al., 2011), classification and regression trees (Aertsen et al., 2010, Aertsen et al., 2011), neural networks (Aertsen et al., 2010), generalised additive models (Aertsen et al., 2010, Aertsen et al., 2011, Shen et al., 2015), and multivariate adaptive regression splines (González-Rodríguez and Diéguez-Aranda, 2020).

Parametric methods that utilise the spatial correlation between the underlying plot data have been less frequently used to develop models of stand productivity. Amongst these geostatistical methods the most commonly used techniques are ordinary kriging and regression kriging (Palmer et al., 2009a, Palmer et al., 2009b, Palmer et al., 2009c); Kimberley et al., 2017). As predictions are made by ordinary kriging through interpolating values between measured plots this method is most precise when plots are located in relatively close proximity (Palmer et al., 2009b). Regression kriging is less reliant on a dense plot network than ordinary kriging as this method fits an underlying regression model and then geospatially refines these estimates through kriging the model residual variation across the area of interest (Palmer et al., 2009b). Empirical bayesian kriging is a recently developed method that overcomes some limitations of regression kriging and can account for local variation in spatial autocorrelation (Samsonova et al., 2017, Gribov and Krivoruchko, 2020).

In comparison to Site Index, only a few parametric modelling methods have been used to predict volume related indices such as 300 Index. Using a relatively small dataset of 23 plots a regression model was used to predict the 400 Index for New Zealand grown Sequoia sempervirens D. Don (coast redwood) (Palmer et al., 2012). Partial least squares, ordinary kriging and regression kriging were used to predict 300 Index of P. radiata using an extensive national dataset covering all environmental conditions found throughout New Zealand plantations (Palmer et al., 2009b, Kimberley et al., 2017).

The recent emergence of advanced machine learning methods allows greater utilisation of the increasing amount of information in geospatial surfaces as these models can often accommodate collinearity between closely correlated environmental variables. Despite this advantage, few studies have compared predictive precision of these methods with more traditional approaches. In European forests, Site Index was more precisely predicted using non-parametric methods than regression, and amongst non-parametric methods, artificial neural networks had the highest predictive performance (Aertsen et al., 2011). Site Index of plantation grown loblolly pine (Pinus taeda L.) in the United States was more precisely predicted using the non-parametric random forest than parametric non-linear regression, but it was noted that random forest had the most potential for erroneous predictions when extrapolating the model beyond the fitted range (Sabatia and Burkhart, 2014).

Comparative studies of model performance undertaken in P. radiata plantations have highlighted the precision of regression kriging and more advanced non-parametric models, but as with other forest species, have not included a comprehensive comparison of models. Within New Zealand plantations, regression kriging using PLS was marginally more precise than ordinary kriging, which in turn was more precise than use of only PLS for prediction of Site Index and 300 Index using a national dataset (Palmer et al., 2009a, Palmer et al., 2009b, Palmer et al., 2009c, Kimberley et al., 2017). Using a regional New Zealand dataset, extracted from a central North Island forest, regression models of Site Index were found to have a slightly superior precision to those created using random forest (Watt et al., 2015). A comparison of seven modelling methods using data collected from northwest Spain found the non-parametric multivariate adaptive regression splines (MARS) most precisely predicted Site Index, which was closely followed by the parametric methods of stepwise regression and PLS. The best MARS model accounted for 50% of the variation in the data using 13 predictors (González-Rodríguez and Diéguez-Aranda, 2020).

One advantage of fitting a range of modelling methods to the data is that it allows the development of ensemble models. A modelling ensemble is a group of models trained by different algorithms that is combined to produce a final set of predictions (Nisbet et al., 2009, Diks and Vrugt, 2010). Predictions of soil classes and properties using a model ensemble have often been found to be more precise than individual modelling methods (e.g. Padarian et al., 2014, Taghizadeh-Mehrjardi et al., 2019) but not always (Dobarco et al., 2017). Although model averaging has been widely used for hydrologic applications (Diks and Vrugt, 2010, Najafi et al., 2011), climate modelling (Benestad, 2002, Min and Hense, 2006) and prediction of soil attributes (Padarian et al., 2014, Dobarco et al., 2017, Taghizadeh-Mehrjardi et al., 2019), applications within forestry are scarce and we are unaware of any research that has used this method for prediction of productivity indices.

It would clearly be useful to undertake a study that comprehensively compares the predictive precision of a range of advanced modelling methods that are augmented using the regression kriging methodology. Using an extensive national dataset obtained from New Zealand grown P. radiata, the objectives of this research were to (i) compare precision of models of Site Index and 300 Index created using a wide range of parametric, geospatial and non-parametric methods, (ii) examine how regression kriging influences precision of the base parametric and non-parametric models, (iii) determine if averaging predictions from the best models, using an ensemble approach, improves overall model precision, (iv) extract the variables of key importance from multiple regression and the most precise non-parametric models to gain a greater understanding of the key determinants of these productivity metrics and (v) produce maps of both productivity indices from the most precise models.

Section snippets

Study area and dataset preparation

The study area from which data was sourced encompassed the range of environmental conditions over which P. radiata is grown in the North and South Island of New Zealand (Fig. 1). Stand level data for P. radiata was extracted from the New Zealand Forest Research Institute Permanent Sample Plot database (Pillar and Dunlop, 1990). Plots that received treatments within spacing, disturbance (forest floor removal), oversowing and fertiliser application trials were excluded but the control plots were

Plot representation of environmental conditions

Plot measurements covered a wide variation in environmental conditions and this variation was very similar between the training and test datasets (Table 1). The ranges in mean annual air temperature, total annual rainfall and mean annual windspeed were respectively 7.7 – 16.0 °C, 551 – 3,587 mm and 4.69 – 26.9 km hr-1 (Table 1). Although there are colder, wetter and windier locations throughout New Zealand (Fig. 2) the plots used here encompassed the range in climatic conditions over which P.

Discussion

This study clearly demonstrates the utility of non-parametric models in predicting landscape level variation in productivity metrics. The results also show that more traditional methods such as regression, can provide a comparable precision, when residuals from these models are kriged. Air temperature and water availability were found to be the key determinants of Site Index while 300 Index was also strongly influenced by these variables and to a lesser extent soil fertility and disease

Conclusion

The base model precision of both productivity metrics generally increased with model complexity and the ability of the model to accommodate collinearity. The two regression tree methods (XGBoost and random forest) provided the most precise predictions as they can both accommodate complex functional forms, variable interactions and are relatively unaffected by the inclusion of many collinear variables in the model. Regression kriging provided significant gains in precision for almost all models

CRediT authorship contribution statement

Michael S. Watt: Conceptualization, Methodology, Software, Formal analysis, Visualization, Writing - original draft, Writing - review & editing, Supervision, Project administration. David J. Palmer: Data curation, Software, Formal analysis, Writing - review & editing. Ellen Mae C. Leonardo: Software, Formal analysis, Visualization, Writing - review & editing. Maxime Bombrun: Data curation, Software, Formal analysis, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

We are grateful to Christine Dodunski for her assistance in obtaining permission to use PSP data and for extracting this data. We are also indebted to numerous forestry companies for supporting this research and providing the permission to use the data. The New Zealand Scion Strategic Scion Investment Fund (SSIF) was used to fund this project. We are grateful to two anonymous reviewers who provided valuable feedback and suggestions that markedly improved the manuscript.

References (114)

  • Y. Liu et al.

    Integrate machine learning and geostatistics for high-resolution mapping of ground-level PM2.5 concentrations

  • Y. Liu et al.

    Improve ground-level PM2.5 concentration mapping using a random forests-based geostatistical approach

    Environ. Pollut.

    (2018)
  • I.O.A. Odeh et al.

    Further results on prediction of soil properties from terrain attributes: heterotopic cokriging and regression-kriging

    Geoderma

    (1995)
  • J. Padarian et al.

    Predicting and mapping the soil available water capacity of Australian wheatbelt

    Geoderma Reg.

    (2014)
  • D.J. Palmer et al.

    Comparison of spatial prediction techniques for developing Pinus radiata productivity surfaces across New Zealand

    For. Ecol. Manage.

    (2009)
  • D.J. Palmer et al.

    Assessing prediction accuracy in a regression kriging surface of Pinus radiata outerwood density across New Zealand

    For. Ecol. Manage.

    (2013)
  • R.L. Parfitt et al.

    Relationships between soil biota, nitrogen and phosphorus availability, and pasture growth under organic and conventional management

    Appl. Soil Ecol.

    (2005)
  • B.D. Pinno et al.

    Predicting productivity of trembling aspen in the Boreal Shield ecozone of Quebec using different sources of soil and site information

    For. Ecol. Manage.

    (2009)
  • D.S. Ross et al.

    Mineralization and nitrification patterns at eight northeastern USA forested research sites

    For. Ecol. Manage.

    (2004)
  • C.O. Sabatia et al.

    Predicting site index of plantation loblolly pine from biophysical variables

    For. Ecol. Manage.

    (2014)
  • F. Sánchez-Rodrıguez et al.

    Influence of edaphic factors and tree nutritive status on the productivity of Pinus radiata D. Don plantations in northwestern Spain

    For. Ecol. Manage.

    (2002)
  • J.K. Vanclay et al.

    Assessing the quality of permanent sample plot databases for growth modelling in forest plantations

    For. Ecol. Manage.

    (1995)
  • K. Vaysse et al.

    Evaluating digital soil mapping approaches for mapping GlobalSoilMap soil properties from legacy data in Languedoc-Roussillon (France)

    Geoderma Reg.

    (2015)
  • K. Vaysse et al.

    Using quantile regression forest to estimate uncertainty of digital soil mapping products

    Geoderma

    (2017)
  • M.S. Watt et al.

    Comparing parametric and non-parametric methods of predicting Site Index for radiata pine using combinations of data derived from environmental surfaces, satellite imagery and airborne laser scanning

    For. Ecol. Manage.

    (2015)
  • M.O. Akinwande et al.

    Variance inflation factor: as a condition for the inclusion of suppressor variable (s) in regression analysis

    Open J. Statistics

    (2015)
  • Barringer, J.R.F., Pairman, D. and McNeill, S.J., 2002. Development of a high‐resolution digital elevation model for...
  • R.E. Benestad

    Empirically downscaled temperature scenarios for northern Europe based on a multi-model ensemble

    Climate Res.

    (2002)
  • Bentéjac, C., Csörgő, A. and Martínez-Muñoz, G., 2019. A Comparative Analysis of XGBoost. arXiv preprint...
  • B. Boehmke et al.

    Hands-on machine learning with R

    (2019)
  • M. Bombrun et al.

    Forest-Scale Phenotyping: Productivity Characterisation Through Machine Learning. Frontiers

    Plant Sci.

    (2020)
  • J.-D. Bontemps et al.

    Predictive approaches to forest site productivity: recent trends, challenges and future perspectives

    Forestry

    (2014)
  • L. Breiman

    Random forests

    Machine Learning

    (2001)
  • L. Breiman et al.

    Classification and regression trees

    (1984)
  • P.A. Burrough et al.

    Principles of geographical information systems

    (2015)
  • Chen, X. W. and Jeong, J. C. 2007. Enhanced recursive feature elimination. InSixth International Conference on Machine...
  • H.Y.H. Chen et al.

    Trembling aspen site index in relation to environmental measures of site quality at two spatial scales

    Can. J. For. Res.

    (2002)
  • T. Chen et al.

    Xgboost: A scalable tree boosting system

  • A.L. Codilan et al.

    Estimating site index from ecological factors for industrial tree plantation species in Mindanao, Philippines

    Bull. Univ. Tokyo For.

    (2015)
  • C.G.H. Diks et al.

    Comparison of point forecast accuracy of model averaging methods in hydrologic applications

    Stoch. Env. Res. Risk Assess.

    (2010)
  • M.R. Dobarco et al.

    Prediction of topsoil texture for Region Centre (France) applying model ensemble methods

    Geoderma

    (2017)
  • J. Dymond et al.

    Roll out of erosion models for Regional Councils, Landcare Research Contract, Report LC0708/094

    Landcare Res. Contract Report LC0708/094

    (2008)
  • F. Eichhorn

    Beziehungen zwischen bestandshöhe und bestandsmasse

    Allgemeine Forst-und Jagdzeitung

    (1904)
  • L. Fontes et al.

    Modelling the Douglas-fir (Pseudotsuga menziesii (Mirb.) Franco) site index from site factors in Portugal

    Forestry

    (2003)
  • J.H. Friedman

    Multivariate adaptive regression splines

    Annals Statistics

    (1991)
  • J.H. Friedman

    Greedy function approximation: a gradient boosting machine

    Ann. Stat.

    (2001)
  • J.C. Gallant et al.

    A multiresolution index of valley bottom flatness for mapping depositional areas

    Water Resour. Res.

    (2003)
  • A. Goldstein et al.

    Peeking inside the black box: Visualizing statistical learning with plots of individual conditional expectation

    J. Comput. Graphical Statistics

    (2015)
  • C.L. Goodale et al.

    The long-term effects of land-use history on nitrogen cycling in northern hardwood forests

    Ecol. Appl.

    (2001)
  • C.J. Goulding

    Measurement of trees. Section 6.5 of the NZIF Forestry Handbook

    (2005)
  • Cited by (20)

    • Spatial comparisons of carbon sequestration for redwood and radiata pine within New Zealand

      2022, Forest Ecology and Management
      Citation Excerpt :

      Predictions of carbon were made for the 18 different models (2 species × 3 stand densities × 3 ages) described in section 2.2. The surfaces of 300 Index and site index used as input to these models are displayed in Fig. A1 and have been previously described in detail for both radiata pine (Watt et al., 2021c) and redwood (Watt et al., 2021a) and are summarised in the Supporting Information (S1). Three of the four productivity surfaces have been previously developed (Watt et al., 2021a; Watt et al., 2021c) and a new redwood site index surface was developed as input to the models.

    • Comparing volume productivity of redwood and radiata pine plantations in New Zealand

      2021, Forest Ecology and Management
      Citation Excerpt :

      These environmental variables were extracted from 25 m resolution surfaces covering the spatial extent of New Zealand. A full description of these surfaces, and accompanying references, is given in Watt et al. (2021a), and these surfaces are briefly summarised below. Climatic surfaces included minimum, average and maximum air temperature, vapour pressure deficit, solar radiation, windspeed and total rainfall (Leathwick and Stephens, 1998; Wratt et al., 2006) which were all summarised at the monthly and annual level (e.g. Fig. A1).

    View all citing articles on Scopus
    View full text