Origin of climatic data can determine the transferability of species distribution models

Methodological research on species distribution modelling (SDM) has so far largely focused on the choice of appropriate modelling algorithms and variable selection approaches, but the consequences of choosing amongst different sources of environmental data has scarcely been investigated. Bioclimatic variables are commonly used as predictors in SDMs. Currently, several online databases offer the same sets of bioclimatic variables, but they differ in underlying source of raw data and method of data processing (extrapolation and downscaling). In this paper, we asked whether predictive performance and spatial transferability of SDMs are affected by the choice of two different bioclimatic databases viz. WorldClim 2 and Chelsa 1.2. We used presence-absence data of the invasive plant Ageratina adenophora from the Western Himalaya for training SDMs and a set of independently-collected presence-only datasets from the Central and Eastern Himalaya to evaluate the transferability of the SDMs beyond the training range. We found that the performance of SDMs was, to a large degree, affected by the choice of the climatic dataset. Models calibrated on Chelsa 1.2 outperformed WorldClim 2 in terms of internal evaluation on the calibration dataset. However, when the model was transferred beyond the calibration range to the Central and Eastern Himalaya, models based on WorldClim 2 performed substantially better. We recommend that, in addition to the choice of predictor variables, the choice of predictor datasets with these variables should not be based merely on subjective decision whenever several options are available. Instead, such decisions should be based on robust evaluation of the most appropriate dataset for a given geographic region and species being modelled. Moreover, decisions could also depend on the objective of the study, i.e. projecting within the calibration range or beyond. Therefore, a quantitative evaluation of predictor datasets from alternative sources should be routinely performed as an integral part of the modelling procedure. Copyright Arunava Datta et al. This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. NeoBiota 59: 61–76 (2020) doi: 10.3897/neobiota.59.36299 http://neobiota.pensoft.net ReseARCh ARtiCle Advancing research on alien species and biological invasions A peer-reviewed open-access journal


introduction
Correlative species distribution models (SDMs, also referred to as ecological niche models or habitat suitability models) are used to estimate the potential geographic distribution of species by modelling the relationship between known occurrences of a species with its environmental conditions (Guisan and Zimmermann 2000;Pearson and Dawson 2003;Elith and Leathwick 2009). These models directly relate the occurrence of a species to its realised multi-dimensional niche (Hutchinson 1957;Pearson and Dawson 2003) in the environmental space (Soberón and Nakamura 2009;Peterson et al. 2011) that is provided by the chosen predictor variables. Climatic conditions are crucial in determining the large-scale distribution patterns of organisms (Woodward 1987;Woodward et al. 2004) and are hence widely used for modelling species distributions (Pearson and Dawson 2003).
SDMs are frequently applied in invasion biology, conservation biology, evolutionary biology and agriculture due to their versatility (Elith and Leathwick 2009;Peterson et al. 2011). SDMs of invasive species are often used to make temporal and spatial predictions of climatically-suitable regions that could potentially be invaded (Thuiller et al. 2005;Ervin and Holly 2011;Jaryan et al. 2013) and thus aid in early detection, control and eradication of the invasive species (Thuiller et al. 2005;Peterson et al. 2011). The distribution of invasive plants will most likely change due to climate change and therefore future projections of invasion from SDMs will further help in taking long-term management decisions (Thuiller et al. 2005;Peterson et al. 2011).
To avoid misleading recommendations for such management decisions, SDMs and the resulting predictions or future projections of suitable environmental conditions and corresponding invasion risks need to be highly reliable. Much of past research has focused on the development of modelling algorithms and model (i.e. variable) selection to increase the performance of SDMs (Guisan and Zimmermann 2000;Elith and Leathwick 2009). Ample studies are available on different methodological aspects, such as the choice of different modelling algorithms, sample size, sample density, variable selection and spatial resolution of environmental layers on model accuracy and transferability Peterson and Nakazawa 2008;Heikkinen et al. 2012;Wenger and Olden 2012).
Model transferability, either in space or time Elith and Leathwick 2009), is of particular importance for invasive species to reliably assess their response to climate change or to predict their invasive potential in novel areas and for corresponding management decisions (Clark et al. 2001;Yates et al. 2018). Therefore, it is essential to assess the predictive accuracy of an SDM, not only within the region in which it was fitted (i.e. internal validation within the calibration range), but also in a geographic region different from the calibration range (i.e. external validation on an independent dataset) (Heikkinen et al. 2012;Wenger and Olden 2012;Fernández and Hamilton 2015). The model transfer may often involve extrapolation if the ranges of the predictors are beyond the calibration range of the model. Model transferability is a particularly challenging issue in species distribution modelling (Araújo and Guisan 2006;Elith and Leathwick 2009;Peterson et al. 2011;Wenger and Olden 2012). A recent review on challenges in transferability of ecological models has flagged many pertinent issues, such as the choice of response variables, sampling bias, choice of modelling algorithm and non-stationarity etc. (Yates et al. 2018).
It has also been shown that the choice of predictor variables can impact model accuracy and transferability Karger et al. 2017;), but studies, focusing exclusively on the consequences of choosing different sources providing the same set of predictor variables, are very scarce (Peterson et al. 2011). Consequently, researchers often rely on their subjective decisions for choosing one source of predictor datasets over others, even if the same set of (potential predictor) variables are available from different sources.
SDMs have increasingly benefitted from the availability of climatic predictors at very high resolutions in the form of rasterised GIS layers available from different sources (Soberón and Nakamura 2009;Peterson et al. 2011). Despite offering the same variables, such different climatic databases could differ in their actual values since they rely on different source data and use different interpolation or downscaling algorithms (Bobrowski and Schickhoff 2017; Karger et al. 2017). Such differences could be particularly relevant in regions of high orographic heterogeneity, which have been shown to be highly sensitive to prediction errors for multiple plant species (Hanspach et al. 2011).
The most widely-used variables for SDMs are the set of 19 bioclimatic variables (Peterson and Nakazawa 2008; O'Donnell and Ignizio 2012) that do not only include annual averages, but also climatic extremes limiting the physiological performance of biological organisms (O'Donnell and Ignizio 2012). Currently, several databases offer free access to these bioclimatic variables. WorldClim was one of the first and most frequently used high resolution (30 arc seconds) global bioclimatic dataset derived from ground weather stations across the globe and interpolated by using latitude, longitude and elevation as independent variables (Hijmans et al. 2005). In the recent version of WorldClim (Version 2; Fick and Hijmans 2017), hereafter referred to as WorldClim 2, satellite-derived covariates, such as land surface temperature and cloud cover, have also been used in the interpolation process to improve the data quality in areas where ground observations are scarce. Chelsa (Version 1.2; Karger et al. 2017), hereafter referred to as Chelsa 1.2, is another bioclimatic database that accounts for orographic patterns of precipitation in mountainous terrains, i.e. it accounts for factors, such as aspect and valley exposition by including wind effects (see Karger et al. 2017). Therefore, it can be assumed that, due to the methodological differences in generating the raster layers, these databases are not equivalent and hence their use in SDMs could result in differences in predictive accuracy and, moreover, in transferability.
In this paper, we asked, whether models calibrated on Chelsa 1.2 and WorldClim 2, respectively, differ in terms of internal and external predictive performance. To this end, we used the invasive plant species Ageratina adenophora (Spreng.) R.M.King & H.Rob. in the Himalaya as our study system. Using presence-absence data of A. adenophora from the Western Himalaya as the response, we calibrated generalised linear models on Chelsa1.2 and WorldClim2 data. Transferability of models calibrated on these two datasets was evaluated using an independent set of presence-only data from Central and Eastern parts of the Himalaya.

Target species
Ageratina adenophora (Crofton weed, Asteraceae) is a plant species native to Mexico and invasive (or even noxious) in more than 30 countries in subtropical regions across the globe (Auld and Martin 1975;Qiang 1998;Tian et al. 2007;Muniappan et al. 2009;Poudel et al. 2019). It is a multi-stemmed, perennial herb or undershrub that grows up to 2 metres and flowers profusely in spring (Tripathi et al. 2012). It was introduced as an ornamental plant to England in the 19 th century (Auld and Martin 1975) and was later introduced in different parts of the world (Muniappan et al. 2009), such as the Himalaya (Dehradun, India) in the early 20 th century (Datta et al. 2017). In South Asia, it has expanded its distribution almost throughout the subtropical and sub-temperate belts of the Himalaya, ranging from Arunachal Pradesh in the east to Himachal Pradesh in the west (Raizada 1976;Tripathi et al. 2012) and also flourishes in mountains of peninsular India (Muniappan and Viraktamath 1993;Muniappan et al. 2009).

Study area and distribution survey
Our study was carried out in a region of the Western Himalaya (Singh and Singh 1987) between 29.96N and 32.55N latitudes and 75.77E and 78.43E longitudes. Our study area covered five provinces in north-western India and stretched from Dhauladhar range (Himachal Pradesh province) in the west to the mountains of Gharwal region (Uttrakhand province) in the east. We also covered a considerable part of lowlying foothills of the Himalaya (Siwalik range).
We haphazardly surveyed 389 locations and recorded the presence or absence of A. adenophora in the subtropical and temperate zones of the Western Himalaya between 300 m to 3000 m elevation (Fig.1). We targeted this elevational belt based on prior knowledge about the distribution of the plant from previous reconnaissance surveys and existing literature on its distribution (Datta et al. 2017). The surveys were conducted in the vegetation periods of 2014 and 2015. Most of the surveys were carried out along road-and riversides as these are conduits for dispersal of propagules and are also initial establishment sites of A. adenophora (Z. Wang et al. 2011). However, many high elevational areas beyond 2500 m were not accessible by road and, hence, we used trekking trails for surveying such remote locations. Alpine were obtained for evaluating the transferability of the species distribution models trained in the Western Himalaya.The relief map of the region is depicted in brown. The relief map was made with layer obtained from Natural Earth and the international borders were digitized from political map of India (9 th edition) published by survey of Inida.. and subalpine regions (> 3500 m) were not surveyed since the plant is known to be entirely absent from these regions due to extremely low temperatures (Datta et al. 2017). At the scale of the used climatic variables (30 arc seconds or 1 km 2 ), microclimatic variations due to trails, roads and water conduits are not a limiting factor for the distribution of the species. Hence, this potential bias in data acquisition should not influence the model outcome and general conclusions. To assess model transferability, we used an independent set of presence-only records (N = 85) that were collected by experts (see acknowledgements) from Central and Eastern Himalaya (Fig. 1).

Climatic data and variable selection
Due to collinearity amongst the 19 bioclimatic variables, we used a cluster analysis to select variables seperately for WorldClim 2 and Chelsa 1.2 (Dormann et al. 2013). All the 19 bioclimatic variables were scaled to zero and unit standard deviation prior to cluster analysis. The dendrogram was constructed, based on Spearman's rank correlation (ρ) using UPGMA (unweighted pair-group method with arithmetic averages) agglomeration method. A threshold value of of ρ = |0.7| (Dormann et al. 2013) was used to prune the dendrogram and select variables that were not highly collinear. This procedure resulted in five clusters for WorldClim1.2 and seven clusters for Chelsa 2 (see Suppl. material 1). Selection of a variable within a cluster was primarily based on its ecological relevance to the study species. For example, the plant is known to be limited by low temperatures in higher elevations (Datta et al. 2017), therefore the minimum temperature of the coldest month was selected (bio 6). Similarly, germination of seeds is known to be limited by moisture in the lower elevations, hence precipitation of the driest month (bio 14) was preferred over other variables (Datta et al. 2017). In order to make the models based on WorldClim 2 and Chelsa 1.2 comparable, we ensured that the set of selected variables was common. However, due to inherent differences in the correlation sructure, the variable selection procedure yielded slightly different sets of variables for the two datasets. Finally, five variables were selected for WorldClim 2, while two additional variables were selected in the case of Chelsa 1.2 (see Table 1).
In addition to the two models based on WorldClim 2 and Chelsa 1.2 data, we calibrated a third model based on Chelsa 1.2 data, but using the same set of five variables that were selected specifically for WorldClim 2 (Table 1). This allowed us to make direct and unbiased comparison between the predictive performance of WorldClim 2 and Chelsa 1.2 and to assess whether our conclusions are potentially confounded by differences in model performance caused by the initial variable selection procedure.

Modelling procedure
We used a multi-model inference approach (Burnham and Anderson 2002) to arrive at the final model to be used for prediction (Grueber et al. 2011;Symonds and Moussalli 2011;Burnham 2015). The following steps were taken: (1) We fitted generalised linear models with binomial error distribution and a logit link function to the presence-absence data of A. adenophora in the Western Himalaya using previously selected climatic variables (Table 1). All predictor variables were scaled to zero mean and unit standard deviation.
(2) We then obtained models with all possible variable combinations using the "dredge" function in the "MuMIn" package (Barton 2015) of R (R Core Team 2017).
(3) A subset of best models that differed by 2 or less in AIC from the best model was considered for a model averaging process (hereafter referred to as "best subset") (Grueber et al. 2011). (4) We then averaged model coefficients, weighted by the corresponding Akaike weights across all models in the best subset. We used the default "full average" method for calculating the averaged coefficients (if a variable is absent from one of the component models, a parameter estimate of "zero" is substituted in the averaging process (Symonds and Moussalli 2011). This method results in shrinkage of parameter estimates for those variables which are less important (Grueber et al. 2011) and has been suggested when prediction from an averaged model is intended (Symonds and Moussalli 2011).

Model evaluation
To obtain binary predictions (i.e. presence or absence output) from continuous probability values, a threshold was selected by maximising the true skill statistic (TSS), which accounts for both omission and commission errors and is known to be independent of prevalence (Allouche et al. 2006). The value of TSS can range from -1 to +1. A value close to +1 indicates good agreement, while a value close to 0 indicates that the model does not perform better than a random model (Allouche et al. 2006). A value close to -1 suggests that a completely inverse model would be better. AUC is a common traditionally-used metric for evaluating the performance of SDMs; however, its efficiency has been questioned (Jiménez-Valverde et al. 2008) and, therefore, we do not report AUC values.
To assess the transferability (i.e. predictive performance of the model beyond the calibration area in the Western Himalaya), we used the independent set of presenceonly data from the Central and Eastern Himalaya (Nepal, Sikkim, Darjeeling and Bhutan; see acknowledgements for contributors). Since we did not have true absence data from these regions, we could not use ordinary model evaluation metrics such as TSS. Therefore, we used the Boyce Index for assessing transferability (Boyce et al. 2002;Hirzel et al. 2006). The Boyce Index compares the ratio of predicted frequency and expected frequency of evaluation points across the prediction gradient using a moving window approach (Hirzel et al. 2006;Petitpierre et al. 2012). It is a thresholdindependent metric ranging between -1 and +1. Positive values close to 1 indicate very good agreement of observed presences with the model prediction, while values very close to zero indicate that the predictions are not better than random. Negative values of the Boyce Index show that the model is worse than a random model and makes predictions in areas that are not suitable for the species (Hirzel et al. 2006). For this purpose, the region of evaluation was defined by drawing a convex hull around the presence-only evaluation points. The convex hull (polygon) was used to crop the prediction layer (raster) from the model. Subsequently, the predicted occurrence probabilities were used as a measure of "habitat suitability" (x-axis) and were correlated (Spearman's correlation) with the "predicted to expected ratio" (y-axis) calculated from the presence-only evaluation points across the prediction gradient using the moving window approach (Hirzel et al. 2006). The Boyce Index was calculated using the "ecospat.boyce" function of the "ecospat" package (Cola et al. 2017). The Boyce Index was also calculated for internal evaluation (i.e. training range) to facilitate direct comparison between Western and Central and Eastern Himalaya using presence only data.
Further, SDMs were projected to a much larger geographic area (entire South Asia) compared to the training area to allow for a general qualitative assessment (i.e. visual agreement), based on a priori knowledge about the distribution of A. adenophora from existing literature. R codes for the entire analysis can be found in Suppl. material 3.

Results
Here, we report the predictive performance of the three averaged models using the multimodel inference approach. The first model ("WorldClim data -WorldClim variable selection") had two component models (i.e. best subset of models that differed by 2 or less in AIC), the second model ("Chelsa data -Chelsa variable selection") had six component models, while the third model ("Chelsa data -WorldClim variable selection") had four component models. The average value of the coefficients for the bioclimatic variables also differed between the models (Suppl. material 2).
Internal evaluation of the models based on TSS, using presence-absence data, showed that Chelsa performed marginally better than WorldClim ( Table 2). The "Chelsa data -Chelsa variable selection" had the highest TSS value amongst all models, while the "Chelsa data -WorldClim variable selection" performed similar to "World-Clim data -WorldClim variable selection" (Table 2). In contrast, internal evaluation using the Boyce Index (based on presence-only data) revealed that the performance of table 2. Model evaluation metrics for different models using Chelsa 1.2 and WorldClim 2 datasets. Database refers to the climatic database used for modelling (calibration). Variable selection refers to the specific set of variables selected using cluster analysis for Chelsa 1.2 and WorldClim 2 datasets (see Table 1 and method section for further details). Sensitivity is the rate of true positives while specificity is the rate of true negatives. Boyce internal refers to the Boyce Index calculated for the training area and Boyce external refers to the Boyce Index calculated for Central and Eastern Himalaya where the model was transferred to. Chelsa 1.2 and WorldClim 2 are written as Chelsa and WorldClim in the the Chelsa models was marginally lower than the WorldClim models, while "Chelsa data -WorldClim variable selection" had the highest Boyce Index (Table 2).
In contrast to internal model evaluation, transferability of the model beyond the calibration range in the Central and Eastern Himalaya was entirely based on the Boyce Index because we had only presence data from these regions. The Boyce Index was highest for the "WorldClim data -WorldClim variable selection" and was slightly negative for "Chelsa data -Chelsa variable selection". Negative value of Boyce's Index indicated that the model predicted high probability of occurrence even for regions that were almost unsuitable for the species.
The visual inspection of the prediction maps also showed that the "Chelsa data -Chelsa variable selection" model produced extremely unrealistic over-predictions (Fig. 2c). For instance, the model showed most parts of South Asia to be highly suitable for A. adenophora, including warm tropical regions of peninsular India. However, in reality, the species is known to be restricted to moist subtropical and temperate regions found at higher elevations (Muniappan and Viraktamath 1993). To identify whether this over-prediction was simply caused by the selection of variables based on the Chelsa dataset, we also assessed the performance of the "Chelsa data -WorldClim variable selection" model. This increased model performance, measured with the Boyce Index, but stayed considerably below that of the "WorldClim data -WorldClim variable selection" model (Table 2). Further, transferability was slightly improved, although many potentially unsuitable regions in central and southern India were still being predicted as climatically suitable for A. adenophora (Fig. 2d).

Discussion
Using two openly-available bioclimatic datasets, we found that the choice of the climatic dataset had a substantial effect on transferability of SDMs in mountainous regions such as the Himalaya. It is interesting to note that, although the same set of five variables was used in the multimodel inference approach for "WorldClim data -WorldClim variable selection" and "Chelsa data -WorldClim variable selection" models, the number of component models in the "best subset" for "Chelsa data -WorldClim variable selection" was twice the number of models in "WorldClim data -WorldClim variable selection". The contribution of the variables in these two models also differed considerably. For example, in the "WorldClim data -WorldClim variables" model, bio15 was the most important variable, but in the case of "Chelsa data -WorldClim variables", bio12 was the most important variable. This suggests that the difference in predictive power between the two databases is most likely due to the underlying differences in the variables and not due to the modelling approach used by us.
We initially expected that the Chelsea 1.2 dataset would perform very well in mountainous areas because it corrects for orographic patterns of precipitation. Earlier studies, based in the Himalaya and the Swiss Alps, showed that the performance of Chelsa was superior to WorldClim. For example, it has been reported that Chelsa 1 outperformed WorldClim 1.4 in predicting the distribution of tree line forming Himalayan birch in the Himalaya . Karger et al. (2017) also found a marginally superior performance of Chelsa 1 over WorldClim 1.4 in predicting the distribution of 67 plant species from Switzerland. However, unlike our study, none of the previous studies verified the transferability of the models in space using an independent occurrence dataset from a different geographic region.
Our study yielded contrasting results, especially in terms of reliability when models are transferred to other regions. This difference could partly be due to the following reasons: i) earlier studies used older versions of the two climatic databases. WorldClim has considerably updated their data in the latest version (WorldClim 2) by incorporating remotely-sensed variables, such as land surface temperature and cloud cover. This update might have significantly improved the quality of the data in contrast to previous versions. ii) since Chelsa 1.2 has made several corrections to account for orographic patterns, especially in precipitation (Karger et al. 2017), these corrections might have changed the spatial pattern of the correlation structure amongst the variables at a local scale (Mesgaran et al. 2016). Therefore, the transferability of the model might be com-promised when the models are projected to a new region characterised by a different correlation structure amongst the variables.
It is worth noting that the values of TSS were not very high for any of the models, indicating that climatic variables alone are not sufficient in explaining the distribution pattern of A. adenophora. For example, empirical studies have shown that the species has a narrow pH range from slightly acidic to neutral soils (pH 5 to 7) and cannot tolerate highly saline conditions . Moreover, biotic interactions and dispersal limitations are also crucial in determining plant distributions (Soberón and Nakamura 2009;Peterson et al. 2011). Therefore, including such variables could probably help in improving the general model performance and transferability for this species.
Although we found WorldClim 2 to perform better in terms of model transferability, it is premature to give generalised recommendations for preferring one dataset over the other, based on this case study alone. The species being studied and the geographic area of the study may be equally important (Hanspach et al. 2011). Providing a general overview, on how pertinent the problem is or under which conditions it applies for which type of species is beyond the scope of this study. We rather want to highlight the potential problem. We therefore recommend that the evaluation of climatic datasets should be performed routinely as an integral part of a modelling exercise and the database with best predictive performance should be chosen. For application of SDMs within the training and calibration region, internal validation is reliable, although performing an out-of-area cross-validation procedure is preferable when sample size is sufficient (Wenger and Olden 2012). However, if model transfer to a different geographic region is desired, validation against an independent occurrence dataset is highly recommended for choosing the most appropriate source of environmental data for the given study system. Therefore, a quantitative evaluation of predictor datasets from alternative sources should be routinely performed as an integral part of the modelling procedure.