Use of advanced modelling methods to estimate radiata pine productivity indices
Introduction
Pinus radiata D. Don (radiata pine) has been widely planted in the Southern Hemisphere and constitutes a large proportion of plantations in New Zealand, Chile and Australia (Lewis and Ferguson, 1993). This species is very responsive to environment and as a consequence productivity has been found to range widely across the environments over which it is grown (Palmer et al., 2009b, Watt et al., 2010). Previous studies have shown that much of this variation in productivity is attributable to air temperature, water availability, soil nutrition and windspeed (Watt et al., 2010, Kirschbaum and Watt, 2011).
Stand productivity is predicted by empirical models as a function of stand age, using non-linear functional forms. Variation in productivity between stands is accounted for by standardised measurements of productivity at a given age that are used to adjust both the trajectory and the asymptote of predictions of productivity over time. Site Index, which expresses the height of dominant or co-dominant trees at a given age (Skovsgaard and Vanclay, 2008), has been most widely used to account for this inter-stand variation as this metric is correlated with productivity (Eichhorn, 1904, Bontemps and Bouriaud, 2014) and the height of dominant trees is relatively invariant to stand density (Pienaar and Shiver, 1984, Lanner, 1985, Maclaren et al., 1995). Although Site Index is very useful for describing standardised variation in height it does have limitations as a productivity metric because stand height does not account for variation in basal area (Hasenauer et al., 1994, Vanclay et al., 1995, Skovsgaard and Vanclay, 2008). As a consequence, indices that normalise stand volume as a function of age, stand density and other important silvicultural variables have been developed. This productivity metric for P. radiata, which essentially describes a normalised mean annual increment at age 30, is termed the 300 Index (Kimberley et al., 2005).
Environmental surfaces have been widely used through a range of modelling approaches to develop maps of Site Index and 300 Index for P. radiata (Palmer et al., 2009b, Kimberley et al., 2017) and Site Index for many other coniferous tree species (Fontes et al., 2003, Wang et al., 2004, Seynave et al., 2005, Monserud et al., 2006, Watt et al., 2009, Palmer et al., 2012). Compared to direct measurements of these indices made using plot data, that are typically averaged to the stand level, predictions of indices from environmental surfaces open up a range of applications that are not available from traditional inventory. The resulting spatial descriptions enhance understanding of the key environmental drivers of productivity and where optimal productivity is likely to occur in both existing forests and unplanted areas (Palmer et al., 2009b, Kimberley et al., 2017). Surfaces of Site Index and 300 Index can also be used as input to models used for key management decisions such as the optimisation of final crop stand density (Sopt) and development of surfaces showing spatial variation in Sopt (Watt et al., 2017).
A large number of modelling methods with varying levels of complexity have been used to predict Site Index for a wide range of forest species growing in Europe, North America and New Zealand. These methods range from relatively simple approaches such as multiple regression (Wang, 1995, Chen et al., 2002, Sánchez-Rodrıguez et al., 2002, Hamel et al., 2004, Nigh et al., 2004, Wang et al., 2004, Seynave et al., 2005, Monserud et al., 2006, Seynave et al., 2008, Socha, 2008, Pinno et al., 2009, Watt et al., 2009, Aertsen et al., 2010, Watt et al., 2010, Aertsen et al., 2011, Palmer et al., 2012, Sharma et al., 2012, Codilan et al., 2015) to more complex parametric methods such as partial least squares (PLS), lasso, elastic net, least angle regression and infinitesimal forward stagewise regression (González-Rodríguez and Diéguez-Aranda, 2020). A wide range of non-parametric methodologies have also been used to model Site Index including random forest (Weiskittel et al., 2011, Sabatia and Burkhart, 2014), boosted trees (Aertsen et al., 2010, Aertsen et al., 2011), classification and regression trees (Aertsen et al., 2010, Aertsen et al., 2011), neural networks (Aertsen et al., 2010), generalised additive models (Aertsen et al., 2010, Aertsen et al., 2011, Shen et al., 2015), and multivariate adaptive regression splines (González-Rodríguez and Diéguez-Aranda, 2020).
Parametric methods that utilise the spatial correlation between the underlying plot data have been less frequently used to develop models of stand productivity. Amongst these geostatistical methods the most commonly used techniques are ordinary kriging and regression kriging (Palmer et al., 2009a, Palmer et al., 2009b, Palmer et al., 2009c); Kimberley et al., 2017). As predictions are made by ordinary kriging through interpolating values between measured plots this method is most precise when plots are located in relatively close proximity (Palmer et al., 2009b). Regression kriging is less reliant on a dense plot network than ordinary kriging as this method fits an underlying regression model and then geospatially refines these estimates through kriging the model residual variation across the area of interest (Palmer et al., 2009b). Empirical bayesian kriging is a recently developed method that overcomes some limitations of regression kriging and can account for local variation in spatial autocorrelation (Samsonova et al., 2017, Gribov and Krivoruchko, 2020).
In comparison to Site Index, only a few parametric modelling methods have been used to predict volume related indices such as 300 Index. Using a relatively small dataset of 23 plots a regression model was used to predict the 400 Index for New Zealand grown Sequoia sempervirens D. Don (coast redwood) (Palmer et al., 2012). Partial least squares, ordinary kriging and regression kriging were used to predict 300 Index of P. radiata using an extensive national dataset covering all environmental conditions found throughout New Zealand plantations (Palmer et al., 2009b, Kimberley et al., 2017).
The recent emergence of advanced machine learning methods allows greater utilisation of the increasing amount of information in geospatial surfaces as these models can often accommodate collinearity between closely correlated environmental variables. Despite this advantage, few studies have compared predictive precision of these methods with more traditional approaches. In European forests, Site Index was more precisely predicted using non-parametric methods than regression, and amongst non-parametric methods, artificial neural networks had the highest predictive performance (Aertsen et al., 2011). Site Index of plantation grown loblolly pine (Pinus taeda L.) in the United States was more precisely predicted using the non-parametric random forest than parametric non-linear regression, but it was noted that random forest had the most potential for erroneous predictions when extrapolating the model beyond the fitted range (Sabatia and Burkhart, 2014).
Comparative studies of model performance undertaken in P. radiata plantations have highlighted the precision of regression kriging and more advanced non-parametric models, but as with other forest species, have not included a comprehensive comparison of models. Within New Zealand plantations, regression kriging using PLS was marginally more precise than ordinary kriging, which in turn was more precise than use of only PLS for prediction of Site Index and 300 Index using a national dataset (Palmer et al., 2009a, Palmer et al., 2009b, Palmer et al., 2009c, Kimberley et al., 2017). Using a regional New Zealand dataset, extracted from a central North Island forest, regression models of Site Index were found to have a slightly superior precision to those created using random forest (Watt et al., 2015). A comparison of seven modelling methods using data collected from northwest Spain found the non-parametric multivariate adaptive regression splines (MARS) most precisely predicted Site Index, which was closely followed by the parametric methods of stepwise regression and PLS. The best MARS model accounted for 50% of the variation in the data using 13 predictors (González-Rodríguez and Diéguez-Aranda, 2020).
One advantage of fitting a range of modelling methods to the data is that it allows the development of ensemble models. A modelling ensemble is a group of models trained by different algorithms that is combined to produce a final set of predictions (Nisbet et al., 2009, Diks and Vrugt, 2010). Predictions of soil classes and properties using a model ensemble have often been found to be more precise than individual modelling methods (e.g. Padarian et al., 2014, Taghizadeh-Mehrjardi et al., 2019) but not always (Dobarco et al., 2017). Although model averaging has been widely used for hydrologic applications (Diks and Vrugt, 2010, Najafi et al., 2011), climate modelling (Benestad, 2002, Min and Hense, 2006) and prediction of soil attributes (Padarian et al., 2014, Dobarco et al., 2017, Taghizadeh-Mehrjardi et al., 2019), applications within forestry are scarce and we are unaware of any research that has used this method for prediction of productivity indices.
It would clearly be useful to undertake a study that comprehensively compares the predictive precision of a range of advanced modelling methods that are augmented using the regression kriging methodology. Using an extensive national dataset obtained from New Zealand grown P. radiata, the objectives of this research were to (i) compare precision of models of Site Index and 300 Index created using a wide range of parametric, geospatial and non-parametric methods, (ii) examine how regression kriging influences precision of the base parametric and non-parametric models, (iii) determine if averaging predictions from the best models, using an ensemble approach, improves overall model precision, (iv) extract the variables of key importance from multiple regression and the most precise non-parametric models to gain a greater understanding of the key determinants of these productivity metrics and (v) produce maps of both productivity indices from the most precise models.
Section snippets
Study area and dataset preparation
The study area from which data was sourced encompassed the range of environmental conditions over which P. radiata is grown in the North and South Island of New Zealand (Fig. 1). Stand level data for P. radiata was extracted from the New Zealand Forest Research Institute Permanent Sample Plot database (Pillar and Dunlop, 1990). Plots that received treatments within spacing, disturbance (forest floor removal), oversowing and fertiliser application trials were excluded but the control plots were
Plot representation of environmental conditions
Plot measurements covered a wide variation in environmental conditions and this variation was very similar between the training and test datasets (Table 1). The ranges in mean annual air temperature, total annual rainfall and mean annual windspeed were respectively 7.7 – 16.0 °C, 551 – 3,587 mm and 4.69 – 26.9 km hr-1 (Table 1). Although there are colder, wetter and windier locations throughout New Zealand (Fig. 2) the plots used here encompassed the range in climatic conditions over which P.
Discussion
This study clearly demonstrates the utility of non-parametric models in predicting landscape level variation in productivity metrics. The results also show that more traditional methods such as regression, can provide a comparable precision, when residuals from these models are kriged. Air temperature and water availability were found to be the key determinants of Site Index while 300 Index was also strongly influenced by these variables and to a lesser extent soil fertility and disease
Conclusion
The base model precision of both productivity metrics generally increased with model complexity and the ability of the model to accommodate collinearity. The two regression tree methods (XGBoost and random forest) provided the most precise predictions as they can both accommodate complex functional forms, variable interactions and are relatively unaffected by the inclusion of many collinear variables in the model. Regression kriging provided significant gains in precision for almost all models
CRediT authorship contribution statement
Michael S. Watt: Conceptualization, Methodology, Software, Formal analysis, Visualization, Writing - original draft, Writing - review & editing, Supervision, Project administration. David J. Palmer: Data curation, Software, Formal analysis, Writing - review & editing. Ellen Mae C. Leonardo: Software, Formal analysis, Visualization, Writing - review & editing. Maxime Bombrun: Data curation, Software, Formal analysis, Writing - review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
We are grateful to Christine Dodunski for her assistance in obtaining permission to use PSP data and for extracting this data. We are also indebted to numerous forestry companies for supporting this research and providing the permission to use the data. The New Zealand Scion Strategic Scion Investment Fund (SSIF) was used to fund this project. We are grateful to two anonymous reviewers who provided valuable feedback and suggestions that markedly improved the manuscript.
References (114)
- et al.
Evaluation of modelling techniques for forest site productivity prediction in contrasting ecoregions using stochastic multicriteria acceptability analysis (SMAA)
Environ. Modell. Software
(2011) - et al.
Comparison and ranking of different modelling techniques for prediction of site index in Mediterranean mountain forests
Ecol. Model.
(2010) - et al.
Indices for nitrogen status and nitrate leaching from Norway spruce (Picea abies (L.) Karst.) stands in Sweden
For. Ecol. Manage.
(2002) - et al.
An erosion model for evaluating regional land-use scenarios
Environ. Modell. Software
(2010) - et al.
Partial least-squares regression: a tutorial
Anal. Chim. Acta
(1986) - et al.
Exploring the use of learning techniques for relating the site index of radiata pine stands with climate, soil and physiography
For. Ecol. Manage.
(2020) - et al.
Productivity of black spruce and Jack pine stands in Quebec as related to climate, site biological features and soil properties
For. Ecol. Manage.
(2004) - et al.
A generic framework for spatial prediction of soil variables based on regression-kriging
Geoderma
(2004) - et al.
Use of a process-based model to describe spatial variation in Pinus radiata productivity in New Zealand
For. Ecol. Manage.
(2011) On the insensitivity of height growth to spacing
For. Ecol. Manage.
(1985)
Integrate machine learning and geostatistics for high-resolution mapping of ground-level PM2.5 concentrations
Improve ground-level PM2.5 concentration mapping using a random forests-based geostatistical approach
Environ. Pollut.
Further results on prediction of soil properties from terrain attributes: heterotopic cokriging and regression-kriging
Geoderma
Predicting and mapping the soil available water capacity of Australian wheatbelt
Geoderma Reg.
Comparison of spatial prediction techniques for developing Pinus radiata productivity surfaces across New Zealand
For. Ecol. Manage.
Assessing prediction accuracy in a regression kriging surface of Pinus radiata outerwood density across New Zealand
For. Ecol. Manage.
Relationships between soil biota, nitrogen and phosphorus availability, and pasture growth under organic and conventional management
Appl. Soil Ecol.
Predicting productivity of trembling aspen in the Boreal Shield ecozone of Quebec using different sources of soil and site information
For. Ecol. Manage.
Mineralization and nitrification patterns at eight northeastern USA forested research sites
For. Ecol. Manage.
Predicting site index of plantation loblolly pine from biophysical variables
For. Ecol. Manage.
Influence of edaphic factors and tree nutritive status on the productivity of Pinus radiata D. Don plantations in northwestern Spain
For. Ecol. Manage.
Assessing the quality of permanent sample plot databases for growth modelling in forest plantations
For. Ecol. Manage.
Evaluating digital soil mapping approaches for mapping GlobalSoilMap soil properties from legacy data in Languedoc-Roussillon (France)
Geoderma Reg.
Using quantile regression forest to estimate uncertainty of digital soil mapping products
Geoderma
Comparing parametric and non-parametric methods of predicting Site Index for radiata pine using combinations of data derived from environmental surfaces, satellite imagery and airborne laser scanning
For. Ecol. Manage.
Variance inflation factor: as a condition for the inclusion of suppressor variable (s) in regression analysis
Open J. Statistics
Empirically downscaled temperature scenarios for northern Europe based on a multi-model ensemble
Climate Res.
Hands-on machine learning with R
Forest-Scale Phenotyping: Productivity Characterisation Through Machine Learning. Frontiers
Plant Sci.
Predictive approaches to forest site productivity: recent trends, challenges and future perspectives
Forestry
Random forests
Machine Learning
Classification and regression trees
Principles of geographical information systems
Trembling aspen site index in relation to environmental measures of site quality at two spatial scales
Can. J. For. Res.
Xgboost: A scalable tree boosting system
Estimating site index from ecological factors for industrial tree plantation species in Mindanao, Philippines
Bull. Univ. Tokyo For.
Comparison of point forecast accuracy of model averaging methods in hydrologic applications
Stoch. Env. Res. Risk Assess.
Prediction of topsoil texture for Region Centre (France) applying model ensemble methods
Geoderma
Roll out of erosion models for Regional Councils, Landcare Research Contract, Report LC0708/094
Landcare Res. Contract Report LC0708/094
Beziehungen zwischen bestandshöhe und bestandsmasse
Allgemeine Forst-und Jagdzeitung
Modelling the Douglas-fir (Pseudotsuga menziesii (Mirb.) Franco) site index from site factors in Portugal
Forestry
Multivariate adaptive regression splines
Annals Statistics
Greedy function approximation: a gradient boosting machine
Ann. Stat.
A multiresolution index of valley bottom flatness for mapping depositional areas
Water Resour. Res.
Peeking inside the black box: Visualizing statistical learning with plots of individual conditional expectation
J. Comput. Graphical Statistics
The long-term effects of land-use history on nitrogen cycling in northern hardwood forests
Ecol. Appl.
Measurement of trees. Section 6.5 of the NZIF Forestry Handbook
Cited by (20)
Financial comparison of afforestation using redwood and radiata pine under carbon regimes within New Zealand
2023, Trees, Forests and PeopleSpatial comparisons of productivity and carbon sequestration for Cupressus lusitanica and macrocarpa within New Zealand
2023, Forest Ecology and ManagementSpatial comparisons of carbon sequestration for redwood and radiata pine within New Zealand
2022, Forest Ecology and ManagementCitation Excerpt :Predictions of carbon were made for the 18 different models (2 species × 3 stand densities × 3 ages) described in section 2.2. The surfaces of 300 Index and site index used as input to these models are displayed in Fig. A1 and have been previously described in detail for both radiata pine (Watt et al., 2021c) and redwood (Watt et al., 2021a) and are summarised in the Supporting Information (S1). Three of the four productivity surfaces have been previously developed (Watt et al., 2021a; Watt et al., 2021c) and a new redwood site index surface was developed as input to the models.
Comparing volume productivity of redwood and radiata pine plantations in New Zealand
2021, Forest Ecology and ManagementCitation Excerpt :These environmental variables were extracted from 25 m resolution surfaces covering the spatial extent of New Zealand. A full description of these surfaces, and accompanying references, is given in Watt et al. (2021a), and these surfaces are briefly summarised below. Climatic surfaces included minimum, average and maximum air temperature, vapour pressure deficit, solar radiation, windspeed and total rainfall (Leathwick and Stephens, 1998; Wratt et al., 2006) which were all summarised at the monthly and annual level (e.g. Fig. A1).