Impact assessment with different scoring tools: How well do alien amphibian assessments match?

Classification of alien species’ impacts can aid policy making through evidence based listing and management recommendations. We highlight differences and a number of potential difficulties with two scoring tools, the Environmental Impact Classification of Alien Taxa (EICAT) and the Generic Impact Scoring System (GISS) using amphibians as a case study. Generally, GISS and EICAT assessments lead to very similar impact levels, but scores from the schemes are not equivalent. Small differences are attributable to discrepancies in the verbal descriptions for scores. Differences were found in several impact categories. While the issue of disease appears to be related to uncertainties in both schemes, hybridisation might be inflated in EICAT. We conclude that GISS scores cannot directly be translated into EICAT classifications, but they give very similar outcomes and the same literature base can be used for both schemes.


Introduction
Alien species can cause a variety of changes to the areas in which they are introduced (Simberloff et al. 2013, Vilà et al. 2010. Impacts of invasive species can include changes to the environment, economy and social systems, they can vary in magnitude, and can include positive as well as negative effects . In its Strategic Plan for Biodiversity, the Convention on Biological Diversity includes the identification and prioritisation of harmful alien species in Aichi Target 9 (UNEP 2011, McGeoch et al. 2016. For prioritisation of actions, and to generally improve our understanding of alien species' impacts, we need ways to compare a multitude of variables measured on impacts caused through various mechanisms by species belonging to widely divergent taxonomic groups. Risk assessment tools in general, and impact assessments specifically are used to prioritise species for management action (e.g., Leung et al. 2012, Kumschick and Richardson 2013, Essl et al. 2011. Due to the importance of such tools in management prioritisation, policy making and regulation, it is crucial that they represent reality as accurately as possible. However, a systematic comparison between impact scoring tools is lacking. For this study we were interested in whether two impact scoring systems relying on published evidence, rather than expert opinion, would lead to the same classification of alien species, using amphibians as a case study. Alien amphibians are an interesting group as the total number of introduced species is relatively small and they can be assessed in their entirety (Kraus 2009), and the quantity and quality of literature reflects that of other taxa (Measey et al. 2016, Evans et al. 2016. The two impact scoring schemes we chose for the comparison are the Generic Impact Scoring System GISS , Nentwig et al. 2016) and the Environmental Impact Classification for Alien Taxa (EICAT) scheme , Hawkins et al. 2015. While EICAT was formally adopted by the IUCN as an official system to classify the threat posed by alien species to the native environment (https://portals.iucn.org/congress/motion/014), to be used alongside the Red List for species conservation, to date it has only been systematically applied to one taxonomic group, namely birds (Evans et al. 2016). The GISS on the other hand is one of the most widely used and adopted impact scoring tools and has been applied to a wide variety of taxa ranging from plants , Novoa et al. 2016) to vertebrates (e.g., Evans et al. 2014, Martin-Albarracin et al. 2015 and invertebrates Nentwig 2014, Nentwig 2015) and spanning many habitats (see Nentwig et al. 2016 for an overview of previous applications). A comparison between these impact scoring schemes can be useful in order to assess to what extent GISS scores can be "translated" into EICAT classifications, given the many GISS assessments which were performed before the adoption of EICAT by IUCN. If GISS scores and EICAT assessments consistently led to the same classification we suggest GISS scores could be adopted under IUCN as an interim measure before full EICAT assessments are made.
In this study, we use the same literature as source information to assess all alien amphibian species with EICAT and GISS. We ask (i) whether the two impact scoring schemes produce complementary maximum classifications, (ii) whether GISS total scores correlate with EICAT assessments, and (iii) under which conditions anomalies occur. Furthermore, it is well known that some taxa receive more research attention than others (e.g., Pyšek et al. 2008). Given that both scoring schemes rely solely on published evidence, it is possible that species reaching higher scores in any of the impact schemes only do so because more information is available on their impact. This would create a bias towards more "popular" species reaching higher impacts. To assess this issue, we ask whether the quantity of literature used to make an assessment correlates with a larger score (i.e., sum and maximum in GISS, and maximum in EICAT) in each of the scoring schemes, and if EICAT assessments with higher confidence ratings were underpinned with more references.

Species selection
We assessed all alien amphibians established anywhere outside of their native range. They comprised of a list provided by Kraus (2009) and additional searches for species with introduced distributions indicated in the IUCN Red List, and led to a selection of 105 alien amphibians (see Measey et al. 2016 for details).

Literature search
Both schemes applied here rely on published literature. We used the species' scientific (scientific binomial) name as search term on Web of Science and Google Scholar and subsequently manually filtered through the titles and abstracts to find publications relevant with regards to impacts of alien populations. We incorporated articles published until August 2015. In the case that the scientific species name had changed recently (since 2000; e.g. Bufo marinus changed to Rhinella marina), we also searched under the older name. In addition, we consulted the references in the relevant publications for suitable references.

GISS, EICAT and how they differ
GISS and EICAT both aim to produce a comparative score for different alien taxa based on published evidence. Both schemes have five levels of impact, and discriminate between no impact and a lack of available data which results in a Data Deficient status (in EICAT) and no score (in GISS), respectively. Table 1 outlines the impact levels of both schemes and the acronyms used for EICAT in this study. Both also specify that the maximum score in any one category should be the overall status for that species and category. Table 1. Summary of GISS and EICAT scores applied across mechanisms (e.g., competition, hybridisation, etc.). See Hawkins et al. (2015) and Nentwig et al. (2016) for details of mechanisms.

EICAT/GISS score Massive (MV)/5 Major (MR)/4 Moderate (MO)/3 Minor (MN)/2
Minimal concern (MC)/1 EICAT Causes at least local extinction of native species, and irreversible changes in community composition; even if the alien taxon is removed the system does not recover its original state Although amphibian impacts have previously been assessed using EICAT (Kraus 2015), we have not considered these data as no detail on separate species' impact classifications were given, and only high impact amphibians were included in this study. GISS and EICAT differ in (i) the number of categories (i.e., mechanisms) and (ii) the details of what is required to score a species in any category. The details of both schemes are published elsewhere (Hawkins et al. 2015, Nentwig et al. 2016) but have been summarised here in Table 1. GISS scores concentrate on the spatial scale at which an alien species is having an impact as well as the number of species that are impacted by the alien. EICAT has no intrinsic spatial scale, instead allowing the impact on the community invaded (however large or small) to dictate the level of threat. Furthermore, EICAT focuses on single species affected within a community and therefore does not take into account the number of native species affected by the alien species.
The schemes also differ in that GISS provides categories for economic as well as environmental impact assessments whereas EICAT only includes environmental impacts. Here we only use scores that relate to environmental assessments of both schemes, because economic assessments were poorly populated for amphibians (see Measey et al. 2016) and it was necessary to keep the results comparable between the two schemes.
In addition to the maximum GISS score (1 to 5), GISS gives sums which are totals of all scores across all categories (1 to 30), but EICAT uses only the maximum scores. However, EICAT assessments assign a confidence level to each assessment ranging from low to high as described in Hawkins et al. (2015). The latest guidelines on the GISS system published refers to the EICAT guidelines for confidence assessment (Nentwig et al. 2016). However, we did not include these in the analyses as previous publications of GISS did not include them (e.g., Nentwig et al. 2010.

Data analyses
We used a paired Wilcoxon signed rank test to assess how similar the maximum and total scores obtained in GISS were to those scored in EICAT. For this we assigned numerical values to EICAT assessments, namely 1 for MC to 5 for MV, respectively: we refer to this as nEICAT. We used a non-parametric (Kendall's tau) correlation test to assess the relationship between the number of publications found per species and (i) nEICAT, (ii) the maximum GISS score and (iii) the sum of all GISS scores for each species respectively. All analyses were performed in R v3.2.1 (R Core Team 2015). Furthermore, we were interested in whether species assessed using EICAT with higher confidence scores had more publications underpinning their impacts. Confidence limits (low, medium and high) were assigned scores 1, 2 and 3, respectively, and analysed with a Kendall's tau correlation test against the number of publications used for the species.

Results
We found that the maximum scores produced by the two impact scoring systems were not equivalent, but the paired Wilcoxon signed rank test was significant, suggesting that they are similar (V = 25; P < 0.0001; Figure 1a). Of the 40 species for which we found relevant literature and which had maximum scores in both systems, 40% had equivalent scores, while 55% scored higher in EICAT and 5% higher in GISS. Of those that scored higher in EICAT, all (n = 22) were a single category higher, while those where GISS scored higher (n = 2) were a single category lower in EICAT. This means that most EICAT scores span at least two maximum GISS scores, except MO which spans three and MC which is directly equivalent to maximum GISS scores for all four species (Table 2).
GISS total scores do not correlate with EICAT assessments (V = 315.5; P = 0.315; Figure 1b). Top total scores in GISS (>10) only reached MR in EICAT with a single exception, the tiger salamander Ambystoma tigrinum scoring in the highest category (MV). Other amphibians which scored MV under EICAT had very low total scores of 4 under GISS, which at the same time are the maximum scores for these species as they only scored under one mechanism. The anomalies (see Figure 1b) occur with high scores for hybridisation in EICAT compared to the comparatively low scores in GISS. It was also noteworthy that there was little difference in total GISS scores between MC and MN classes in EICAT.
In total, we found 242 relevant publications for 40 species, with an average of 5.9 publications per species (excluding the 65 species for which no data was available). A full reference list can be found in Measey et al. (2016). We found that both EICAT and GISS maximum score were not related to the number of publications found on the species' impacts (Kendall's tau = 0.24 and 0.25; P = 0.059 and 0.055 respectively; Figure 2). However, we found that the sum of environmental scores for GISS was more related to the number of publications, explaining nearly half of the variation in the data (Kendall's tau = 0.41; P = 0.048; Figure 2c). Lastly, higher confidence EICAT classifications did not have more publications for that species (Kendall's tau = 0.21; P = 0.121).

Discussion
This paper presents the first systematic EICAT assessment for amphibians detailing species-specific classifications. Kraus (2015) assessed the impacts of selected amphibians using EICAT without however reporting on impact levels per species.
Our study shows that for alien amphibians, EICAT assessments are not equivalent to maximum or total scores under GISS. This means that we cannot simply adopt GISS assessments under IUCN instead of performing full EICAT assessments. However, we  Figure 2. The relationship between the number of publications with data that can be used to assess impact for a species of alien amphibian and a its EICAT score b its GISS score and c the sum of environmental scores.
found that the scores were very similar, and, where they did differ, they differed by a single level of impact. The broad agreement between these two impact scoring schemes is encouraging as it suggests that each is managing to provide a comparative measure of impact, despite having different sets and numbers of criteria. Moreover, as both schemes rely on the same type of data, namely published evidence, once literature has been amassed for making a GISS score, the same data sources can be productively used for an EICAT assessment. The detailed EICAT assessments for each species will be externally reviewed and published under the IUCN umbrella on the Global Invasive Species Database (GISD; http://www.iucngisd.org/gisd/) after acceptance by the EI-CAT Unit (Hawkins et al. 2015).
Of particular note are species which score the highest possible in the one system but not the other: 5 in GISS but MR in EICAT, or MV in EICAT but 4 in GISS. This is the case for three species (Table 2). On the one hand, Rhinella marina reached GISS scores of 5 in two categories, namely "Impacts on animals through [...] intoxication" and "Impacts through transmission of diseases [...]". A local extinction of Dasyurus hallucatus occurred in Australia where quolls were poisoned when they preyed on R. marina (Oakwood and Foster 2008), however as the effect was considered reversible, it was given MR in EICAT. R. marina have also been shown to be the hosts of a parasite negatively affecting native Australian frogs, which was not present in the area before the toads arrived (Hartigan et al. 2010(Hartigan et al. , 2011(Hartigan et al. , 2012. The formulation in GISS of a maximum disease impact (see Nentwig et al. 2016) leaves room for different assessors to score different impacts, based on their interpretation, which might have led to a high score in GISS and a MO in EICAT. Given the severity of the effects of Batrachochytrium dendrobatidis and other diseases, both EICAT and GISS appear to highlight the difficulty of assigning the spread of disease through alien taxa and the transmission thereof to native species (see also Measey et al. 2016, Evans et al. 2016, although this is widely acknowledged in amphibians (Fisher and Garner 2007).
The two Pelophylax species scoring highest in EICAT but not in GISS had demonstrated impacts related to hybridisation, predation and competition with native species. The two schemes have in common that for low to medium impact levels of 1-3 (GISS) or MC to MO (EICAT) respectively, hybrids of the native and alien species need to be sterile. However, in the distinction of the two highest impact levels, EICAT and GISS differ. Higher impacts through hybridisation in GISS are determined by the relative quantity of hybrid populations (Nentwig et al. 2016). Given that EICAT scores have not been published before for amphibians, we would like to point out a feature of the scheme which could potentially be problematic for some taxa. According to Hawkins et al. (2015), the impact of an alien species on native species through hybridisation follows a slightly different logic than the remaining categories, insofar as fitness (and capacity to produce offspring) of the hybrids is also considered on top of fitness of the pure native species: EICAT distinguished the two highest classifications in terms of the vigor of F1 offspring -MV leading to fully vigorous and fertile offspring, MR with sterile F1 hybrids (Hawkins et al. 2015). Therefore, to reach the maximum score in EICAT (i.e. MV), a proportion of hybrids is not stipulated and consequently, for many amphibians where F1s are fertile, it does not appear to be possible to have an EICAT score lower than MV.
Hybridisation should be carefully considered in amphibians, especially frogs and salamanders, as some of these species readily hybridise through polyploidy and may have done so for many decades (e.g., Vorburger and Reyer 2003). To the best of our knowledge no native species have been lost from any specific location despite destabilising hybridisation favouring the alien taxon (e.g., Quilodrán et al. 2015, Leuenberger et al. 2014. If strictly following the guidelines by Hawkins et al. (2015), only species of which F1 parents produce sterile offspring could have MR impacts. However in cases where F1 hybrids can produce fertile offspring, it would be classified as MV, assuming that in all cases this would lead to genomic extinction of the native species. In the GISS hybridisation, impacts of level 4 and 5 are only distinguished through the size of the hybrid population (and remaining native population), which in the case of frogs might be a more sensible way to classify alien species' impacts through this mechanism. We feel that this would also be more in line with the impact levels of the remaining mechanisms in EICAT.
Furthermore, in some cases, species previously imbedded under the same species name were split into two species, which "creates" a hybridisation impact of one species on another which was previously unrecognised. An example thereof is the hybridisation of tiger salamanders (Ambystoma tigrinum) with the California tiger salamander (Ambystoma californiense) (e.g., Riley et al. 2003, Fitzpatrick et al. 2010). This issue is not restricted to amphibians, but could happen in every case where subspecies change to species status. Also, the impact mechanism is not restricted to hybridisation, but could for example include competition (e.g. Arntzen and Thorp 1999). This might lead to the need to revise assessments in certain cases when taxonomy is updated.
Summing impact scores can potentially be biased towards species with higher research efforts, as it is more likely for various mechanisms to be studied for these species. Our data on the number of publications to make an assessment are not atypical (Measey et al. 2016) and similar patterns should therefore be expected in other taxa. Using maximum scores not only for EICAT but also GISS assessments, as suggested previously (e.g. Kumschick et al. 2016), can reduce this bias. Still, alien species which affect the recipient communities through various mechanisms might be more problematic as the impacts are less specific and probably less context dependent. For example, species only impacting communities through hybridisation (e.g. Pelophylax ridibundus and P. bedriagae in our study; Arano et al. 1995, Pagano et al. 1997, Holsbeek et al. 2008 are less likely to cause such impacts in other areas where these native species are not present than species like A. tigrinum which also affect native communities through predation (Ryan et al. 2009). Furthermore, we show that high confidence for an assessed impact score might come from a single, well executed study, while many studies which are poor with respect to defining impact will not result in a higher level of confidence (but see Evans et al. 2016). Likewise, many good studies might result in a high confidence for a lesser impact level, but a single less rigorous study may result in a higher impact, but with poor confidence. Therefore, we emphasise the importance of reporting more detail than simply the highest score and its mechanisms for the classification of taxa, but also to include other high confidence findings, as well as information on different impact mechanisms (Hawkins et al. 2015).

Conclusion
The adoption of a single impact scoring scheme under an international umbrella such as IUCN is necessary, yet we show the potential pitfalls of converting scores between two widely used schemes: GISS and EICAT. These schemes are largely congruent, but do present some challenges where one might borrow from the other to resolve apparent discrepancies for amphibians which we feel are likely to manifest in time for other taxa. Levels of impact assigned in general, but specifically on disease transmission and hybridisation require detailed background information backing up the classification, and additional guidelines should be considered to make classifications more unified in this regard.