Research Article |
Corresponding author: César Capinha ( ccapinha@cibio.up.pt ) Academic editor: Petr Pyšek
© 2018 César Capinha, Franz Essl, Hanno Seebens, Henrique Miguel Pereira, Ingolf Kühn.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Capinha C, Essl F, Seebens H, Pereira HM, Kühn I (2018) Models of alien species richness show moderate predictive accuracy and poor transferability. NeoBiota 38: 77-96. https://doi.org/10.3897/neobiota.38.23518
|
Robust predictions of alien species richness are useful to assess global biodiversity change. Nevertheless, the capacity to predict spatial patterns of alien species richness remains largely unassessed. Using 22 data sets of alien species richness from diverse taxonomic groups and covering various parts of the world, we evaluated whether different statistical models were able to provide useful predictions of absolute and relative alien species richness, as a function of explanatory variables representing geographical, environmental and socio-economic factors. Five state-of-the-art count data modelling techniques were used and compared: Poisson and negative binomial generalised linear models (GLMs), multivariate adaptive regression splines (MARS), random forests (RF) and boosted regression trees (BRT). We found that predictions of absolute alien species richness had a low to moderate accuracy in the region where the models were developed and a consistently poor accuracy in new regions. Predictions of relative richness performed in a superior manner in both geographical settings, but still were not good. Flexible tree ensembles-type techniques (RF and BRT) were shown to be significantly better in modelling alien species richness than parametric linear models (such as GLM), despite the latter being more commonly applied for this purpose. Importantly, the poor spatial transferability of models also warrants caution in assuming the generality of the relationships they identify, e.g. by applying projections under future scenario conditions. Ultimately, our results strongly suggest that predictability of spatial variation in richness of alien species richness is limited. The somewhat more robust ability to rank regions according to the number of aliens they have (i.e. relative richness), suggests that models of aliens species richness may be useful for prioritising and comparing regions, but not for predicting exact species numbers.
biological invasions, clamping, model evaluation, predictive modelling, transferability
Knowing the distribution patterns of alien species richness is increasingly crucial for assessing and monitoring global biodiversity (
Despite substantial progress being made in the availability of alien species distributions, there still are numerous regions worldwide for which alien species richness data are lacking or highly incomplete (
Despite the relevance of information on alien species richness, little work has been done to assess whether alien species richness can be accurately predicted for areas where such data are lacking. If this metric is possible to predict with accuracy, then available data can be used to geographically broaden the current knowledge on biodiversity patterns and to support conservation and alien species management decisions in areas that are currently not surveyed. Further, high reliability of predictive models would enable the integration of predictive modelling in alien species mapping.
Two main lines of modelling approaches are used for predicting alien species richness. The first consists of the use of stacked species distribution models (e.g.
Species distribution modelling has been intensively applied to alien species in recent years and modelling practices are increasingly refined (e.g.
Here, we perform a formal evaluation of the ability of species richness models to predict the richness of alien species. We measure and compare the predictive accuracies of five modelling techniques extensively used in ecology: i) a Generalised Linear Model (GLM) using a Poisson distribution (GLM-P), ii) a GLM using a negative binomial distribution (GLM-NB), iii) boosted regression trees (BRT), iv) multivariate adaptive regression splines (MARS) and v) random forests (RF). We assess the ability of the modelling techniques to predict within the geographical range of the model’s calibration data (i.e. “geographical interpolation”) and in new, spatially independent regions (i.e. transferability or “geographical extrapolation”;
We collected 22 typical data sets of alien species richness from previous studies (Table
Characteristics of the datasets of alien species richness used for predictive modelling.
Number | Data source | Taxonomic group | Geographical coverage | Types of regions | Number of regions* | Mean richness |
---|---|---|---|---|---|---|
1 |
|
Amphibians | Europe | Countries, administrative subdivisions and islands | 26 | 1.7 (SD = 1.5) |
2 |
|
Amphibians | Global | Countries, administrative subdivisions and islands | 337 | 0.8 (SD = 1.4) |
3 |
|
Ants | Global | Countries, administrative subdivisions and islands | 345 | 9.3 (SD = 9.8) |
4 |
|
Birds | Europe | Countries, administrative subdivisions and islands | 52 | 6.5 (SD = 5.5) |
5 | Blackburn et al. 2015 | Birds | Oceanic islands worldwide | Islands | 56 | 8.5 (SD = 11.1) |
6 |
|
Birds | Global | Countries, administrative subdivisions and islands | 484 | 4.7 (SD = 7) |
7 |
|
Bryophytes | Europe | Countries, administrative subdivisions and islands | 32 | 2 (SD = 2.6) |
8 |
|
Bryophytes | Global | Countries, administrative subdivisions and islands | 78 | 2.5 (SD = 4) |
9 |
|
Conifers | Temperate and subtropical regions | Countries, administrative subdivisions and islands | 60 | 5.5 (SD = 4.9) |
10 |
|
Flowering plants | México | Administrative divisions | 32 | 158.6 (SD = 64.4) |
11 |
|
Fungi | Europe | Countries, administrative subdivisions and islands | 51 | 19.9 (SD = 16.6) |
12 |
|
Mammals | Europe | Countries, administrative subdivisions and islands | 45 | 5.3 (SD = 2.6) |
13 |
|
Mammals | Global | Countries, administrative subdivisions and islands | 484 | 7.9 (SD = 4.5) |
14 |
|
Plants | Oceanic islands worldwide | Islands | 49 | 269.4 (SD = 346.3) |
15 |
|
Plants | Europe | Countries, administrative subdivisions and islands | 39 | 339.4 (SD = 269.1) |
16 |
|
Reptiles | Europe | Countries, administrative subdivisions and islands | 37 | 1.3 (SD = 2.1) |
17 |
|
Reptiles | Global | Countries, administrative subdivisions and islands | 337 | 2.2 (SD = 4) |
18 |
|
Spiders | Global | Countries, administrative subdivisions and islands | 307 | 6.2 (SD = 6.4) |
19 |
|
Terrestrial gastropods | Global | Countries, administrative subdivisions and islands | 51 | 13.6 (SD = 7.1) |
20 |
|
Terrestrial insects | Europe | Countries, administrative subdivisions and islands | 53 | 163.3 (SD = 133.2) |
21 |
|
Vascular plants | Europe | Countries, administrative subdivisions and islands | 20 | 252 (SD = 185) |
22 | van Kleunen et al. 2015 | Vascular plants | Global | Countries, administrative subdivisions and islands | 525 | 257.6 (SD = 307.4) |
In order to have a common geographical basis for the assignment of values of the predictor variables (below), we matched each region with the corresponding polygon of the Global Administrative Areas Database v.2.8 (GADM; http://www.gadm.org/). The GADM is the most detailed delimitation of worldwide administrative divisions available. We excluded all regions that we could not identify unambiguously, that had no geographical match in GADM and also regions that were smaller than 1 km2 ‒ the highest spatial resolution provided by gridded predictor variables; see below. In some cases, this resulted in reduced numbers of records compared to the original datasets (62% to 100% of the records in the original datasets kept for our analyses, average = 92% ± 10.2%). Most data sets in our collection contain a relatively low number of regions with 15 datasets consisting of less than 100 regions and 7 of less than 50 regions (Table
We selected nine explanatory variables representing factors that have been shown in previous studies to explain the variation in alien species richness (
Geodesic area (km2) was measured using the spatial polygon of the region after re-projection to a Mollweide equal area projection. Insularity was a binary variable (islands or mainland regions). Mean annual temperature and mean annual precipitation represent region-wide averages of the corresponding climatic conditions (at ca. 1×1 km) derived from WorldClim (http://www.worldclim.org/). We defined bioclimatic diversity as the total number of distinct bioclimatic types delimited by
We performed all data processing in R (v. 3.4.1) (www.R-project.org/). The extraction of values from the source gridded datasets was done using the ‘extract’ method of the RASTER (v. 2.3-40) package.
We tested five techniques for modelling alien species richness: i) GLM-P using a Poisson distribution, ii) GLM-NB using a negative binomial distribution, iii) boosted regression trees using a Poisson distribution (BRT), iv) multivariate adaptive regression splines (MARS) and v) random forests (RF). These methods were selected because they fall into different positions along the spectrum of statistical assumptions and modelling architectures, allowing a number of relevant comparisons. These include i) a comparison of GLMs having a restrictive (GLM-P) and a more relaxed distributional assumption (GLM-NB); ii) comparison of GLMs with machine learning models (BRT, MARS and RF) and iii) comparison of a linear regression-type machine learning model (MARS) with tree ensembles-type machine learning models (BRT, RF). We briefly describe each of the modelling techniques used in the Suppl. material
We implemented GLM-P using the standard ‘glm()’ function of R and GLM-NB using the ‘glm.nb()’ function of the MASS (v. 7.3–37) package. The theta parameter, which represents the dispersion of the data in the calculation of the variance of the NB distribution, was estimated by means of maximum likelihood. An important step in the application of GLMs is to identify the ‘best’ combination of predictors (
Hurdle models (
Multivariate adaptive regression splines (MARS;
Random forests (RF;
To implement Boosted regression trees (
The performance of the previous techniques in predicting alien species richness for each of the 22 datasets was assessed using two distinct approaches. The first was a leave-one-out cross-validation (
The assessment of validation accuracy was made for two criteria: (1) agreement between reported and predicted absolute values of alien species richness and (2) agreement between the rank order of reported and predicted alien species richness. For the first criterion, we calculated the ‘relative absolute error’ (RAE). An RAE of zero represents a perfect match between predicted and observed values, while 100% corresponds to the level of error that is obtained if all predictions simply represent the average of the alien species richness values used for evaluation (
Two distinct evaluation assessments were made for GLM-P, GLM-NB and MARS, in order to account for their greater susceptibility to errors when predicting beyond the sampling space of the calibration data (i.e. when extrapolating). While BRT and RF ‘do not extrapolate’ because they use the closest known subspace of the calibration data as target for the prediction (
Multiple pairwise Wilcoxon tests were used to test for significant differences in RAE and ρ between the five modelling techniques. The differences were assessed by comparing the performances achieved by each method in the 22 datasets of alien species richness tested. In the case of GLM-P, GLM-NB and MARS, pairwise Wilcoxon tests were also used to compare the accuracy from clamped versus the not clamped versions of the predictions.
Spatial autocorrelation (SAC) in the distribution of alien species richness may lead to incorrect model parameter estimates (
Results for the leave-one-out cross-validation ‒ which assesses the accuracy of predictions made for within the geographical range of the data ‒ show that RF and BRT provided the comparably best performances (Figure
Detailed legend: Accuracy was measured for generalised linear models using Poisson (GLM-P) and negative binomial distributions (GLM-NB), boosted regression trees (BRT), random forests (RF) and multivariate adaptive regression splines (MARS) for predictions of the total number of alien species per region (relative absolute error, RAE; lower is better) (a, b) and the rank order of each region (Spearman’s rho; higher is better) (c, d). Boxplots represent variations in accuracy across 22 datasets of alien species richness for GLM-P, GLM-NB, RF and MARS, but not for BRT. Due to model convergence issues, results for BRT comprise only a subset of datasets and are thus not directly comparable with the results of the other techniques. Panels in the right left (a, c) refer to predictions evaluated using a leave-one-out approach, which measures the accuracy of predictions within the geographical range of the model calibration data. Panels in the right (b, d) refer to predictions evaluated using a four-fold regional cross-validation approach, which assesses the spatial transferability of the models. A few outliers lie outside the ranges of the Y-axes, see Tables
Results of pairwise Wilcoxon tests of significant differences for the performance of the techniques for predicting absolute richness (as measured by relative absolute error) using leave-one-out cross-validation (A) and regional cross-validation (B). Predictions of GLM-P, GLM-NB and MARS refer to models using ‘clamped’ data (see main text). Significant differences (at α = 0.05) are shown in bold.
A | ||||
---|---|---|---|---|
GLM-P | GLM-NB | MARS | RF | |
GLM-P | ‒ | |||
GLM-NB | 0.341 | ‒ | ||
MARS | 0.33 | 0.103 | ‒ | |
RF | < 0.001 | < 0.001 | 0.02 | ‒ |
BRT | 0.003 | < 0.001 | 0.036 | 0.953 |
B | ||||
GLM-P | GLM-NB | MARS | RF | |
GLM-P | ‒ | |||
GLM-NB | 0.622 | ‒ | ||
MARS | 1 | 0.597 | ‒ | |
RF | 0.058 | 0.011 | 0.04 | ‒ |
BRT | 0.263 | 0.561 | 0.159 | 0.004 |
The application of data ‘clamping’ in predictions of GLM-P, GLM-NB and MARS resulted in improvements in predictive performance for nearly all datasets (Suppl. material
Regarding the predictions for the relative order of regions in alien species richness, these were more accurately predicted by RF (median ρ = 0.63), closely followed by BRT (median ρ = 0.62). The higher performance of RF was significantly different from the performances achieved by GLM-P, GLM-NB and MARS (p < 0.05; Wilcoxon rank-sum test), but not for BRT (p > 0.05). BRT was also significantly better in predicting the relative order of regions in terms of alien species richness than the two GLM-type models (p < 0.05). The application of clamping to GLMs and MARS did not significantly alter their accuracy (p > 0.05; Wilcoxon rank-sum test). A reasonable number of data sets achieved high (ρ > 0.6) degrees of correlation between the predicted and observed order of regions, particularly for the two best performing techniques (RF and BRT), whereas weak (ρ < 0.25) correlations were less common (Figure
Results for the 4-fold regional cross-validation, which assesses the transferability of model predictions, showed a consistently worse predictive accuracy than the one evaluated by leave-one-out cross-validation. All modelling techniques showed substantially higher medians of RAE (Figure
The least inaccurate technique was RF (median RAE = 95.1%; interquartile range = 25.5%) (Figure
Similarly to what was verified for leave-one-out cross validation, the application of data ‘clamping’ in predictions of absolute alien species richness by GLM-P, GLM-NB and MARS resulted in clear increases in predictive performances for nearly all datasets (Suppl. material
For predictions of the relative order of regions in alien species richness (ρ), no method emerges as best performing (p > 0.05; Wilcoxon rank-sum test). The application of clamping did not significantly alter the results (p > 0.05; Wilcoxon rank-sum test). Most datasets showed moderate to low (ρ < 0.45) degrees of correlation between the predicted and observed order of regions (Figure
Our results show that values of alien species richness can be predicted with reasonable to moderate accuracy within the geographical range of the model calibration data, but only poorly in regions outside this range. This drop in predictive power was verified across modelling techniques and concerned the capacity to predict both absolute alien species richness and relative alien species richness.
The poor transferability of statistical models is not unexpected because the relationships they identify are not functional (mechanistic) and may thus be limited in their realism outside the space of the calibration data. Issues related to transferability have been well documented and examined for species distribution models (e.g.
Another possibility for the poor transferability of the aliens species richness models could be that the relationships (i.e. the covariance structure) between predictor and response variables and amongst response variables are not conserved in the areas that lie beyond the spatial range of the calibration data (
A third possible cause for the observed poor transferability of models concerns extrapolation, which is also related to the information content in the model calibration data. Predictions made for conditions out of the range of the calibration data are extremely challenging, no matter the modelling technique used (
A good transferability of models of alien species richness may not be required if predictions or model-based inferences are intended for the geographical range of the sampling data. Our results show that, under these settings, models of alien species richness can achieve moderate (e.g. RAE ≈ 75% and ρ ≈ 0.6) predictive accuracy. However, one of our most prominent results was that predictions from RF and BRT were significantly better, despite not being good, than those from GLM-type models and MARS. This occurred even after data clamping being applied to GLMs and MARS, allowing the effect of the higher susceptibility of these models to extrapolation errors to be discarded.
It is not unexpected to find RF and BRT outperforming GLMs in non-transferred predictions. The model fitting process of the former techniques consists in iteratively fitting the data and testing the ability of the fitted relationships for prediction using portions of data left-out from the fitting. The leave-one-out cross-validation mimics this procedure, differing mainly in that the error levels measured are not used to retune the model. Hence, machine learning techniques are specifically optimised to predict well, based on the patterns sampled by the data. Besides, the capacity of machine learning techniques to fit complex functions could be particular relevant for models of alien species richness, because the relationships between variables in these models are often fitted along wide gradients (such as for global-scale environmental and socio-economic variation; e.g.
Similarly to what was verified in the predictions of transferred models, extrapolation also severely harmed predictions made for in-sample geographical ranges. Ideally, extrapolation should be overcome by the use of additional data, sampling the extrapolating predictors’ space. However, when that is not possible, our results show that the use of clamping is strongly recommended. Further benefits could also be expected from the examination of the conditions leading to extrapolation, such as the identity of the predictors involved and of how far the model has to extrapolate in the predictors’ space. This has been assessed in SDMs previously (see, for instance,
Overall, our results suggest that accurate predictions of regional alien species richness from correlative models are beyond the scope of the models we used. This is particularly the case for absolute values of richness, whereas relative richness, despite not achieving overall good accuracy, showed to be more robust to errors. Here we analysed the transferability of models on species richness between regions. A complementary analysis, recognising species identity explicitly and, hence, also allowing for the analysis of species turnover, are models of compositional similarity (e.g.
We showed that regional alien species richness cannot be predicted with reliability using the data and methods typically found in literature. Given that these data and methods already reflect best available possibilities to modellers, in the near future the coverage of information gaps on alien species richness is likely to remain entirely dependent on the publication and updating of alien species inventories, which reinforces recent calls for the publication of this information (
Two of our results are also of relevance for descriptive models of alien species richness. First, we found that tree ensembles-type modelling techniques (RF and BRT) are consistently superior in predicting non-transferred values of richness than GLM and MARS. This supports the fact that flexible, non-linear, models are better able to capture information from the data than GLM, a more commonly used technique. The common justification for the use of GLM-type models for analysing alien species richness concerns their ease of interpretation. However, a number of methods have recently been developed to improve the interpretability of tree ensembles (e.g.
We acknowledge the comments from one anonymous reviewer and Cang Hui who helped to improve the manuscript. CC was supported by a postdoctoral grant from FEDER Funds through the Operational Competitiveness Factors Programme “COMPETE” and by National Funds through the Foundation for Science and Technology (FCT) within the framework of project “PTDC/AAG-GLO/0463/2014-POCI-01-0145-FEDER-016583”. FE acknowledges funding from the Austrian Science Fund (FWF, grant I2086-B16). HS was supported by a grant from the Deutsche Forschungsgemeinschaft (DFG, grant SE 1891/2-1).
Table A1–A5