Research Article |
Corresponding author: Pablo González-Moreno ( p.gonzalez-moreno@cabi.org ) Academic editor: Philip Hulme
© 2019 Pablo González-Moreno, Lorenzo Lazzaro, Montserrat Vilà, Cristina Preda, Tim Adriaens, Sven Bacher, Giuseppe Brundu, Gordon H. Copp, Franz Essl, Emili García-Berthou, Stelios Katsanevakis, Toril Loennechen Moen, Frances E. Lucy, Wolfgang Nentwig, Helen E. Roy, Greta Srėbalienė, Venche Talgø, Sonia Vanderhoeven, Ana Andjelković, Kęstutis Arbačiauskas, Marie-Anne Auger-Rozenberg, Mi-Jung Bae, Michel Bariche, Pieter Boets, Mário Boieiro, Paulo Alexandre Borges, João Canning-Clode, Federico Cardigos, Niki Chartosia, Elizabeth Joanne Cottier-Cook, Fabio Crocetta, Bram D'hondt, Bruno Foggi, Swen Follak, Belinda Gallardo, Øivind Gammelmo, Sylvaine Giakoumi, Claudia Giuliani, Guillaume Fried, Lucija Šerić Jelaska, Jonathan M. Jeschke, Miquel Jover, Alejandro Juárez-Escario, Stefanos Kalogirou, Aleksandra Kočić, Eleni Kytinou, Ciaran Laverty, Vanessa Lozano, Alberto Maceda-Veiga, Elizabete Marchante, Hélia Marchante, Angeliki F. Martinou, Sandro Meyer, Dan Minchin, Ana Montero-Castaño, Maria Cristina Morais, Carmen Morales-Rodriguez, Naida Muhthassim, Zoltán Á. Nagy, Nikica Ogris, Huseyin Onen, Jan Pergl, Riikka Puntila, Wolfgang Rabitsch, Triya Tessa Ramburn, Carla Rego, Fabian Reichenbach, Carmen Romeralo, Wolf-Christian Saul, Gritta Schrader, Rory Sheehan, Predrag Simonović, Marius Skolka, António Onofre Soares, Leif Sundheim, Ali Serhan Tarkan, Rumen Tomov, Elena Tricarico, Konstantinos Tsiamis, Ahmet Uludağ, Johan van Valkenburg, Hugo Verreycken, Anna Maria Vettraino, Lluís Vilar, Øystein Wiig, Johanna Witzell, Andrea Zanetta, Marc Kenis.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
González-Moreno P, Lazzaro L, Vilà M, Preda C, Adriaens T, Bacher S, Brundu G, Copp GH, Essl F, García-Berthou E, Katsanevakis S, Moen TL, Lucy FE, Nentwig W, Roy HE, Srėbalienė G, Talgø V, Vanderhoeven S, Andjelković A, Arbačiauskas K, Auger-Rozenberg M-A, Bae M-J, Bariche M, Boets P, Boieiro M, Borges PA, Canning-Clode J, Cardigos F, Chartosia N, Cottier-Cook EJ, Crocetta F, D’hondt B, Foggi B, Follak S, Gallardo B, Gammelmo Ø, Giakoumi S, Giuliani C, Fried G, Jelaska LS, Jeschke JM, Jover M, Juárez-Escario A, Kalogirou S, Kočić A, Kytinou E, Laverty C, Lozano V, Maceda-Veiga A, Marchante E, Marchante H, Martinou AF, Meyer S, Michin D, Montero-Castaño A, Morais MC, Morales-Rodriguez C, Muhthassim N, Nagy ZA, Ogris N, Onen H, Pergl J, Puntila R, Rabitsch W, Ramburn TT, Rego C, Reichenbach F, Romeralo C, Saul W-C, Schrader G, Sheehan R, Simonović P, Skolka M, Soares AO, Sundheim L, Tarkan AS, Tomov R, Tricarico E, Tsiamis K, Uludağ A, van Valkenburg J, Verreycken H, Vettraino AM, Vilar L, Wiig Ø, Witzell J, Zanetta A, Kenis M (2019) Consistency of impact assessment protocols for non-native species. NeoBiota 44: 1-25. https://doi.org/10.3897/neobiota.44.31650
|
Standardized tools are needed to identify and prioritize the most harmful non-native species (NNS). A plethora of assessment protocols have been developed to evaluate the current and potential impacts of non-native species, but consistency among them has received limited attention. To estimate the consistency across impact assessment protocols, 89 specialists in biological invasions used 11 protocols to screen 57 NNS (2614 assessments). We tested if the consistency in the impact scoring across assessors, quantified as the coefficient of variation (CV), was dependent on the characteristics of the protocol, the taxonomic group and the expertise of the assessor. Mean CV across assessors was 40%, with a maximum of 223%. CV was lower for protocols with a low number of score levels, which demanded high levels of expertise, and when the assessors had greater expertise on the assessed species. The similarity among protocols with respect to the final scores was higher when the protocols considered the same impact types. We conclude that all protocols led to considerable inconsistency among assessors. In order to improve consistency, we highlight the importance of selecting assessors with high expertise, providing clear guidelines and adequate training but also deriving final decisions collaboratively by consensus.
Environmental impact, expert judgement, invasive alien species policy, management prioritization, risk assessment, socio-economic impact
Coupled with the increasing evidence of adverse impacts exerted by some non-native species (NNS) on native species and ecosystems (
Robust NNS impact protocols should ideally result in accurate and consistent impact scores for a species even if applied by different assessors, as long as they have the adequate expertise in the assessed species and context. However, despite the importance of consistency in impact protocols, we have little understanding of the patterns in consistency of impact scores across assessors and protocols, and more importantly, which factors contribute to high levels of consistency. The level of consistency in species scores across assessors may depend on the characteristics of the protocol (e.g. taxonomic and environmental scope, impact types included), but also on the available scientific evidence of impact, and the level of expertise of assessors. For instance, we may expect high consistency (i.e. low impact score variability) across assessors for well-studied species, or when all assessors have an in-depth understanding of the species under consideration.
Several international and national organizations and research groups have developed NNS protocols (Table
Characteristics of impact assessment protocols used in the study. Each protocol is characterized in terms of the a) taxonomic group the protocol could be used for, b) the impact categories included (environmental alone or environmental and socio-economic), c) the final scoring scale (i.e. three levels, five levels, and more than 5 levels), d) whether the final score is based on the maximum score of impacts, e) whether the protocol included questions on species spread as part of a risk assessment (yes/no), f) the number of questions contributing to the final score, and g) the mean assessor expertise on species required to fill the questionnaire (1–5 scale based on 63 online anonymous questionnaire responses).
Protocol | Full name | Taxonomic groups | Impact categories | Final scoring scale | Final scoring based on maximum score | Spread questions included | Number of questions | Expertise on species required | Reference |
---|---|---|---|---|---|---|---|---|---|
BINPAS | Biological Invasion Impact/Biopollution Assessment | Aquatic animals | Environmental | 5 | yes | yes | 5 | 3.50 | ( |
EICAT | Environmental Impact Classification for Alien Taxa | All | Environmental | 5 | yes | no | 9 | 3.37 | ( |
EPPO-EIA | European Plant Protection Organisation-Environmental Impact Assessment for plants (EPPO-EIA-PL) and terrestrial invertebrates (EPPO-EIA-IN) | Terrestrial plants and invertebrates | Environmental | 5 | yes | no | 8 (Plants); 9 (invert.) | 3.16 | ( |
EPPO-PRI | EPPO-Prioritization scheme | Plants | Environmental and socio-economic | 3 | yes | yes | 11 | 3.00 | ( |
FISK (and related) | Fish Invasiveness Screening Kit (FISK); Freshwater Invertebrate Invasiveness Screening Kit (FI-ISK); Marine Fish Invasiveness Screening Kit (MFISK); Marine Invertebrate Invasiveness Screening Kit (MI-ISK) | Aquatic animals | Environmental and socio-economic | 3 | no | yes | 49 | 4.12 | ( |
GABLIS | German-Austrian Black List Information System | All | Environmental | 3 | yes | yes | 12 | 3.22 | ( |
GB-NNRA | Great Britain Non-native Species Risk Assessment | All | Environmental and socio-economic | 5 | no | yes | 33 | 3.90 | ( |
GISS | Generic Impact Scoring System | All | Environmental and socio-economic | >5 (discrete with max 60) | no | no | 12 | 3.46 | ( |
Harmonia+ | Belgian risk screening tools for potentially invasive plants and animals | All | Environmental and socio-economic | >5 (continuous | yes | yes | 20 | 3.46 | ( |
ISEIA | Belgian Invasive Species Environmental Impact Assessment | All (not marine for this study) | Environmental | 3 | no | yes | 4 | 2.81 | ( |
NGEIAAS | Norway Generic Ecological Impact Assessment of Alien Species | All | Environmental | 5 | yes | yes | 11 | 4.34 | ( |
A few comparative analyses have addressed differences in the structure of impact assessment protocols (
Eleven commonly used scientifically based protocols developed or applied in Europe for the evaluation of NNS impacts were selected for comparison by consensus in the AlienChallenge COST Action workshop in April 2014 by 36 European experts in NNS risk assessments (Rhodes, Greece) (Table
Each protocol was characterized according to several variables (Table
A total of 57 species from different taxonomic groups not native to terrestrial, freshwater, and marine environments in Europe were selected (Suppl. material
There is a large variation in methods to implement the different protocols; some are available as downloadable freeware (-ISK toolkits, the ‘NAPRA’ version of the GB-NNRA), as online applications (e.g. Harmonia+, BINPAS), whereas some have to be constructed following the text guidelines (e.g. GISS, EICAT), and others can be obtained as spreadsheets (e.g. GB-NNRA) or databases (e.g. NGEIAAS). To harmonize use of the protocols and facilitate data retrieval, a comprehensive Excel® spreadsheet template was developed to include all the protocols (see Suppl. material
Using the protocols selected in the spreadsheet template, 89 assessors independently assessed between three to 11 species (mean = 3.9) of the taxonomic group in their area of expertise (i.e. terrestrial plants, aquatic plants, terrestrial vertebrates, terrestrial insects, other terrestrial invertebrates, freshwater invertebrates, freshwater fish, marine species and pathogens) (Suppl. material
Before retrieving the data, each assessment was checked for completeness. Once all NNS assessments were completed, the final scores for each assessment were extracted. To harmonize scores across protocols, all ordinal scores (i.e. protocols with three or five levels as final scoring scale; Table
For each NNS and protocol (471 combinations), the mean and the coefficient of variation (CV) of the final score were calculated. The mean was used as the overall score across experts per NNS and protocol, whereas CV was used as an estimate of the consistency of scores across experts, adjusting for the mean value. First, differences in CV among all protocols were tested using a linear mixed model with protocol name as a fixed effect and species nested within taxonomic groups as random effects (i.e. random intercept model). Second, we used multimodel inference (
Differences in the mean CV among levels for the categorical variables in the best candidate model (i.e. with the smallest AICc) were tested for significance using a Tukey post hoc test. Prior to modelling, continuous predictors for the models above were checked for multicollinearity using Pearson correlations. All variables were selected for further analyses considering the low correlation values found (r < 0.5; Suppl. material
Similarities in the scoring of NNS across the different protocols were compared using hierarchical cluster analyses. Cluster analyses of the mean scores per NNS and protocol (calculations described above) were performed using Spearman’s correlation coefficient as a similarity measure and the complete linkage method (i.e. maximum distance between clusters). Using this method, we first carried out a cluster analysis of all NNS across the six protocols common to all taxonomic groups (i.e. GABLIS, GB-NNRA, EICAT, Harmonia+, GISS and NGEIAAS). Then, separate analyses were also performed for four subsets of NNS with common protocols: 1) aquatic and terrestrial plants, 2) aquatic animals (combining freshwater invertebrates, freshwater fish, and marine invertebrates), 3) terrestrial invertebrates (terrestrial insects and other terrestrial invertebrates), and 4) terrestrial vertebrates (Suppl. material
The mean coefficient of variation (CV) of assessor scores per NNS and protocol was 40% (± 37% SD), with 10% (n = 470) showing complete agreement (CV = 0) among assessors but with maximum variability being 223% (four species in ISEIA: Aedes albopictus, Arion vulgaris, Australoheros facetus and Fascioloides magna; two species in EPPO EIA: Diabrotica virgifera and Tuta absoluta). CV was remarkably different among protocols (Fig.
Coefficient of variation (CV) of species scoring across assessors per impact assessment protocol based on linear mixed models controlling for taxonomic group and species as nested random effects and number of assessments per species as fixed effects. Protocols with the same letters above the graph are not significantly different (p < 0.05; Tukey test). Dots indicate the least squares means per protocol. Lines indicate the confidence interval (95%) around the means.
According to Tukey post hoc tests in the best candidate model, protocols using three score levels had significantly lower CV than the protocols using scales with five levels (difference = 0.25, p < 0.001) or more than five levels (difference = 0.29, p < 0.001). However, protocols with five score levels were similar to protocols with more than five levels (p = 0.27). CV across assessors was significantly lower for protocols that required higher expertise than those for which low expertise was required (Table
Mean regression coefficient and confidence interval (95%) of taxonomic groups (random effects) in the best linear mixed model explaining the coefficient of variation of scores of 57 invasive non-native species for 11 different protocols including all significant species, assessor and protocol characteristics (see Table
Average coefficient and Akaike weights for each species, assessor and protocol variable within the best linear mixed models (AICc < 6) explaining the coefficient of variation of the scores of 57 non-native species in 11 impact assessment protocols. Taxonomic groups and species identification were included as nested random effect. Predictors with weight closer to one have a higher relative importance to explain the response variable. Variables with weight equals zero were not included in the best subset of models to calculate average coefficients.
Variable | Coefficient | Adjusted SE | z | P | Weight |
---|---|---|---|---|---|
Intercept | 0.36 | 0.06 | 5.76 | <0.001 | |
Number of assessments | 0 | ||||
Species | |||||
Web of Science records (available knowledge) | -0.06 | 0.05 | 1.18 | 0.24 | 0.06 |
Assessor | |||||
Mean assessor expertise | -0.04 | 0.02 | 2.21 | 0.03 | 0.14 |
CV assessor expertise | 0 | ||||
Protocol | |||||
Scoring scale | See results section | 1 | |||
Expertise required | -0.14 | 0.02 | 7.76 | <0.001 | 1 |
Using maximum impact score (yes-no) | -0.12 | 0.02 | 4.93 | <0.001 | 1 |
Spread (yes-no) | 0.12 | 0.05 | 3.57 | <0.001 | 0.95 |
Impact type | 0 | ||||
Number of questions | 0 |
The pair-wise correlations in NNS scores among the six protocols common to all taxa were highly diverse (min–max = 0.16–0.77; mean = 0.55), indicating low consistency in species scores among some protocols (Fig.
Spearman correlation matrix and hierarchical cluster of species scorings for the protocols common for all species. The color scale indicates the correlation between the species scorings obtained for each protocol pair. In brackets, the mean of all pair-wise correlations.
The comparison of impact assessment protocols for NNS shows that scoring variability across assessors can be substantial, depending on the taxonomic group considered and the scoring system. However, there is potential to reduce this variability by considering the expertise of the assessors and optimizing structural characteristics of the protocol. Furthermore, the ranking of NNS based on the protocol scoring can differ depending on the approach implemented, mainly based on the impact category type considered (i.e. whether socio-economic impacts are included). Thus, the selection of the scoring approach can have important consequences on the final ranking of NNS produced.
Scoring consistency across assessors and for some taxonomic groups was surprisingly low. It is not clear why these large discrepancies occurred even when the assessors were experts in invasion biology within their taxonomic domain. Many factors can influence the interpretations of context dependence found in the scientific literature, which can lead to subjective and inconsistent answers even amongst expert assessors (
Part of the variability in consistency was explained by protocol characteristics and the approaches implemented. Protocols with three score levels were more likely to show consistency among assessors than those with five or more levels. However, a three-category scoring system might not be sufficient to discriminate between NNS impacts or magnitude of impacts and rank NNS for prioritisation, because too many species will have the same score. Protocols that select the highest impact among different categories provided higher consistency. By definition, this approach will homogenise the scores towards higher values discarding inconsistencies from less important impacts in a way that results will be more conservative.
Protocols containing questions that required greater expertise on the species yielded higher scoring consistency than simpler protocols. Protocols requiring greater expertise demanded very detailed information about the species (e.g. expected population lifetime in NGEIAAS) that, when available, is very likely to be available only in few studies. Owing to the restricted number of sources of information, the variability in the final score might be low. Complex protocols might be less user-friendly and more time-consuming, but this in itself could increase focus and decrease subjectivity. Exceptions exist, e.g. the -ISK screening (
Regarding assessor and NNS characteristics, the only factor that significantly increased consistency among assessors was their level of expertise with the assessed species. Assessors that had previous experience with the NNS assessed may have had similar high levels of knowledge on that NNS, and this may have led to similar scores. Nevertheless, this situation is infrequent as NNS assessments are more commonly undertaken by persons familiar with the taxonomic group but not necessarily with the NNS being assessed (e.g. NNS not yet present or still rare in the study area). Unexpectedly, consistency was not related to the availability of information about the species (i.e. higher number of WoS records). The simplest explanation is that the number of studies available does not necessarily indicate more studies relevant for impact assessments as the literature on these species could be linked to other research fields in invasion biology not directly associated with their environmental or socioeconomic impacts. It is also relevant to note that different assessors might have had access to different information sources, particularly non-English literature and reports. This might have affected consistency results but we followed standard practices for NNS risk assessments. Further studies could look at these differences providing a base information for the species to be assessed.
The high inconsistency found among assessor’s scores raises high concerns and suggests that assessments conducted by single assessors should be interpreted with caution (
Variations among protocols in species scoring are mainly due to the inclusion, or not, of socio-economic impacts. Although socio-economic and environmental impacts are generally correlated (
Among all protocols, Harmonia+, FISK and GABLIS led to very different scores in comparison to the other protocols. This difference was partly related to the different impact categories considered but also to the inclusion of questions beyond impact (e.g. management in GABLIS and FISK). Finally, the GB-NNRA protocol showed a variable relation with other protocols across taxa: low correlation with protocols only considering environmental impacts for plants and terrestrial invertebrates but high for vertebrates. The final score in the GB-NNRA was not automatically calculated as in the other protocols. Instead, assessors were asked to provide overall summary scores and confidence rankings for the NNS based on the answers provided in previous sections, which include questions that consider both environmental and socio-economic impacts (
Several key factors should be taken into account when selecting or designing a NNS risk assessment protocol, such as the aim, the scope, the consistency and the accuracy of the outcomes, and the resources available to perform the assessment (e.g. time or information). As a first step, the suitability of a NNS risk assessment protocol will depend on the scope and aim of the assessment. For instance, if a NNS is already present in the region of interest, assessments on likelihood of entry and establishment are less meaningful than just the assessment of impact. Protocols with different scopes may produce different results in terms of NNS rankings (
Part of the inconsistency might also come from the way the protocol is used in practice (e.g. standardized forms, clear guidelines, selection of assessors, individual vs. group assessments). We propose three main ways to reduce this type of inconsistency. First, irrespectively of the protocol, selecting a group of assessors with high expertise will yield more consistent results. Second, inconsistencies due to linguistic uncertainties (e.g. definitions, formulations, rating) can be reduced by improving the guidelines and with adequate training of the assessors (
This article is based upon work from the COST Action TD1209: Alien Challenge. COST (European Cooperation in Science and Technology) is a pan-European intergovernmental framework. The mission of COST is to enable scientific and technological developments leading to new concepts and products and thereby contribute to strengthening Europe’s research and innovation capacities. PGM was supported by the CABI Development Fund (with contributions from ACIAR (Australia) and Dfid (UK) and by Darwin plus, DPLUS074 ‘Improving biosecurity in the SAUKOTs through Pest Risk Assessments’. MV by Belmont Forum-Biodiversa project InvasiBES (PCI2018-092939). CP by Sciex-NMSch 12.108. JMJ and WCS by BiodivERsA (FFII project; DFG grant JE 288/7-1). JMJ by DFG project JE 288/9-1,9-2. CR and MB by Fundação para a Ciência e a Tecnologia grants SFRH/BPD/91357/2012 and SFRH/BPD/86215/2012, respectively. PS by MESTD of Serbia, grant #173025. JP by RVO 67985939 and 17-19025S. JCC was supported by a starting grant in the framework of the 2014 FCT Investigator Programme (IF/01606/2014/CP1230/CT0001).
Supplementary materials
Data type: statistical data
Explanation note: Figure S1: hierarchical cluster of the species scores for the six protocols common to all taxonomic groups. Figure S2: hierarchical cluster of the species scorings for plants and aquatic animals without correcting for sample size bias. Table S1: list of non-native species. Table S2: correlation analyses.
Supplementary materials
Data type: Spreadsheet template
Explanation note: Spreadsheet template to fill the 11 impact assessment protocols for non-native species considered in the study.