Corresponding author: Ashley N. Schulz ( anschulz7@gmail.com ) Academic editor: Richard Shaw
© 2020 Ashley N. Schulz, Angela M. Mech, Craig R. Allen, Matthew P. Ayres, Kamal J. K. Gandhi, Jessica Gurevitch, Nathan P. Havill, Daniel A. Herms, Ruth A. Hufbauer, Andrew M. Liebhold, Kenneth F. Raffa, Michael J. Raupp, Kathryn A. Thomas, Patrick C. Tobin, Travis D. Marsico.
This is an open access article distributed under the terms of the CC0 Public Domain Dedication.
Citation:
Schulz AN, Mech AM, Allen CR, Ayres MP, Gandhi KJK, Gurevitch J, Havill NP, Herms DA, Hufbauer RA, Liebhold AM, Raffa KF, Raupp MJ, Thomas KA, Tobin PC, Marsico TD (2020) The impact is in the details: evaluating a standardized protocol and scale for determining non-native insect impact. NeoBiota 55: 61-83. https://doi.org/10.3897/neobiota.55.38981
|
Assessing the ecological and economic impacts of non-native species is crucial to providing managers and policymakers with the information necessary to respond effectively. Most non-native species have minimal impacts on the environment in which they are introduced, but a small fraction are highly deleterious. The definition of ‘damaging’ or ‘high-impact’ varies based on the factors determined to be valuable by an individual or group, but interpretations of whether non-native species meet particular definitions can be influenced by the interpreter’s bias or level of expertise, or lack of group consensus. Uncertainty or disagreement about an impact classification may delay or otherwise adversely affect policymaking on management strategies. One way to prevent these issues would be to have a detailed, nine-point impact scale that would leave little room for interpretation and then divide the scale into agreed upon categories, such as low, medium, and high impact. Following a previously conducted, exhaustive search regarding non-native, conifer-specialist insects, the authors independently read the same sources and scored the impact of 41 conifer-specialist insects to determine if any variation among assessors existed when using a detailed impact scale. Each of the authors, who were selected to participate in the working group associated with this study because of their diverse backgrounds, also provided their level of expertise and uncertainty for each insect evaluated. We observed 85% congruence in impact rating among assessors, with 27% of the insects having perfect inter-rater agreement. Variance in assessment peaked in insects with a moderate impact level, perhaps due to ambiguous information or prior assessor perceptions of these specific insect species. The authors also participated in a joint fact-finding discussion of two insects with the most divergent impact scores to isolate potential sources of variation in assessor impact scores. We identified four themes that could be experienced by impact assessors: ambiguous information, discounted details, observed versus potential impact, and prior knowledge. To improve consistency in impact decision-making, we encourage groups to establish a detailed scale that would allow all observed and published impacts to fall under a particular score, provide clear, reproducible guidelines and training, and use consensus-building techniques when necessary.
environmental impact, expert opinion, impact assessment, joint fact-finding, non-native species management, policy-making, uncertainty
Globally, anthropogenic, abiotic, and biotic threats increasingly affect the structure and function of forest ecosystems (
In particular, consensus among experts may be difficult to achieve (
Although disagreements may arise, impact assessments perform a crucial role in biosecurity programs for management of non-native species (
To help remedy inconsistency and disagreement among assessors, standard impact scoring systems (
Impact scores were recently used to categorize non-native forest insects that specialize on conifers (Mech et al. 2019a). During this project, a group of scientists (the “High-Impact Insect Invasion” working group; HIWG) collaborated to create a detailed nine-point scale of impact, but only one assessor was responsible for determining the impact score for the 58 non-native conifer-specialists currently in North America. These scores were eventually used as the basis for a statistical model that will be used to predict the impact of non-native conifer-specialists that have not yet become established in North America (Mech et al. 2019a). The purpose of our study was to evaluate whether the impact scale used in Mech et al. (2019a) is detailed enough for multiple people with different levels of expertise to reach the same impact score. We examined how level of expertise, uncertainty, and disagreement may affect impact assessment of non-native conifer-specialist insect species. Specifically, the objectives of the study were to: 1) evaluate the level of consensus among individual assessments of non-native insect impacts; 2) measure correlation among level of prior expertise, impact score, and assessor level of uncertainty; 3) assess the points of agreement and disagreement to determine which types of insects are the most difficult to assess with consensus; and 4) explore how experts can use joint fact-finding, a form of consensus-building, to identify sources of highly divergent impact scores and achieve consensus in decision-making using a case study of two insect species with highly divergent impact scores.
In 2016, the HIWG, composed mainly of the co-authors of this paper, convened to examine the drivers of non-native insect invasions (Mech et al. 2019a) and develop a model to predict future high-impact, non-native, phytophagous insect species in natural ecosystems in North America. The group of scientists had different specialties (Suppl. material
The HIWG designed an original nine-point scale (Fig.
The HIWG initiated their research by conducting a pilot study on the 58 non-native, conifer-specialist insect species (i.e., restricted to feeding on one or more of the three conifer families in North America: Cupressaceae, Pinaceae, and Taxaceae) currently in North America (Mech et al. 2019a; Suppl. material
For this study, we were interested in evaluating the impact scale used in Mech et al. (2019a), so we also focused on non-native, conifer-specialist insects in North America. For each conifer-specialist insect, assessors were provided with the list of references that described the host damage used to determine the impact scores used in Mech et al. (2019a). Of the 58 conifer-specialist insects that were originally identified in the pilot study, 17 insect species were excluded from our study because they received an impact score of one. This meant there was no documented damage and, therefore, no references were provided. The remaining 41 conifer-specialist insects (Suppl. material
Examples of non-native, conifer-specialist insects, including (A) European spruce sawfly (Gilpinia hercyniae), (B) spruce needle aphid (Elatobium abietinum), (C) lesser spruce shoot beetle (Hylurgops palliatus), (D) hemlock woolly adelgid (Adelges tsugae), (E) pale juniper webworm (Aethes rutilana), (F) elongate hemlock scale (Fiorinia externa), (G) larch sawfly (Pristiphora erichsonii), and (H) Japanese cedar longhorned beetle (Callidiellum rufipenne) with the (I) mean (± SE) insect impact score (black bars) and within-group interrater agreement index (rWG , gray bars) for all 41 conifer-specialist insects assessed in this study.
For each insect, the three new assessors were provided the same list of references as the initial assessor. The new assessors did not have access to the impact score assigned by the initial assessor to avoid bias. The references provided for each insect were mostly exhaustive, but for well-studied species (e.g., hemlock woolly adelgid [Adelges tsugae Annand]), references that were representative of the damage repeatedly found in published articles were selected in lieu of providing all impact literature. No publications or websites, other than the ones provided, could be used by the assessors. Further, assessors were advised to not use their existing knowledge to evaluate impact and base their impact score solely on the information provided in the references.
Prior to completing the impact assessment exercise, assessors were provided with a sample score sheet that was developed by the first author. The score sheet included directions on how to assess impact and self-assign their level of expertise and uncertainty for each insect (Suppl. material
Descriptive statistics were calculated for impact score and assessor levels of expertise and uncertainty for each insect, with all means reported ± 1 SE. A power function analysis was used to determine the required number of assessments per species. To evaluate the overall level of consensus among assessors, we calculated Krippendorff’s alpha (Kα), a coefficient used to measure agreement among observers (
,
where s2x is the observed variance among the impact scores from the four assessors, and σ2E is the expected variance in the case of no consensus among assessors (
When assessors are in perfect agreement, the index rWG equals one, and any disagreement will cause the rWG index to approach zero. Like Kα, rWG = 0.70 is the traditionally accepted threshold that demarcates high versus low assessor agreement, whereby any values ≥ 0.70 indicate high agreement among assessors (
Spearman’s rank correlation tests were conducted to measure the correlations between assessor levels of expertise and uncertainty. To measure whether expertise and uncertainty influence assignment of impact scores, we calculated the coefficients of variation for insect impact score, level of expertise, and level of uncertainty using the four assessor scores and ratings for each insect. We then conducted Spearman’s rank correlation tests using the coefficients of variation for level of expertise and impact score and level of uncertainty and impact score, respectively.
Following the completion and compilation of all assessments, assessors met in person for a joint fact-finding session in August 2017 to identify potential sources of variation for insects with highly divergent impact scores. For our joint fact-finding discussion (
Mean impact scores ranged from 1.5 ± 0.5 for lesser spruce shoot beetle (Hylurgops palliatus Gyllenhal; Fig.
Summary of descriptive statistics (mean ± SE) for the self-assessed level of expertise (range of 1–5, in which 1 is no expertise and 5 is high expertise), impact level (range of 1–9, in which 1 is no documented damage and 9 is functional extinction of the host plant), and self-assessed level of uncertainty (scale of 1–5, where 1 is low uncertainty and 5 is high uncertainty) for each insect species assessed in this study.
Conifer-specialist Insect Species | Mean ± SE Expertise | Mean ± SE Impact | Mean ± SE Uncertainty |
---|---|---|---|
Acantholyda erythrocephala | 1.50 ± 0.50 | 4.75 ± 0.48 | 2.75 ± 0.25 |
Adelges abietis | 2.75 ± 0.75 | 2.00 ± 0.00 | 1.50 ± 0.29 |
Adelges laricis | 2.75 ± 0.85 | 1.75 ± 0.25 | 1.75 ± 0.25 |
Adelges piceae | 3.75 ± 0.95 | 8.50 ± 0.50 | 2.00 ± 0.70 |
Adelges tsugae | 4.75 ± 0.25 | 9.00 ± 0.00 | 1.50 ± 0.29 |
Aethes rutilana | 2.00 ± 0.41 | 1.75 ± 0.25 | 1.75 ± 0.48 |
Aspidiotus cryptomeriae | 2.75 ± 0.48 | 2.50 ± 0.29 | 2.00 ± 0.41 |
Brachyderes incanus | 2.00 ± 0.00 | 2.25 ± 0.63 | 2.50 ± 0.65 |
Callidiellum rufipenne | 1.50 ± 0.29 | 2.75 ± 0.85 | 2.25 ± 0.75 |
Carulaspis juniperi | 2.50 ± 0.87 | 3.25 ± 0.63 | 2.50 ± 0.96 |
Carulaspis minima | 1.75 ± 0.25 | 2.75 ± 0.85 | 3.00 ± 0.71 |
Cinara tujafilina | 2.25 ± 0.25 | 2.00 ± 0.00 | 1.75 ± 0.25 |
Coleophora laricella | 2.00 ± 0.41 | 4.75 ± 0.25 | 1.75 ± 0.48 |
Contarinia baeri | 2.00 ± 0.41 | 2.00 ± 0.00 | 2.25 ± 0.25 |
Dichomeris marginella | 2.00 ± 0.71 | 2.25 ± 0.25 | 2.75 ± 0.75 |
Diprion similis | 2.25 ± 0.63 | 4.50 ± 0.65 | 2.50 ± 0.29 |
Dynaspidiotus tsugae | 2.25 ± 0.25 | 3.00 ± 0.58 | 3.00 ± 0.71 |
Elatobium abietinum | 2.00 ± 0.41 | 5.25 ± 1.25 | 1.75 ± 0.25 |
Epinotia nanana | 1.25 ± 0.25 | 1.75 ± 0.25 | 2.25 ± 0.48 |
Eulachnus agilis | 2.00 ± 0.41 | 2.00 ± 0.00 | 2.25 ± 0.75 |
Eulachnus brevipilosus | 2.00 ± 0.41 | 2.00 ± 0.00 | 2.75 ± 0.85 |
Eulachnus rileyi | 1.75 ± 0.25 | 2.00 ± 0.00 | 2.50 ± 0.65 |
Exoteleia dodecella | 2.00 ± 0.41 | 2.00 ± 0.00 | 2.25 ± 0.48 |
Fiorinia externa | 2.75 ± 1.03 | 3.50 ± 0.96 | 2.75 ± 0.75 |
Gilpinia frutetorum | 2.50 ± 0.50 | 2.75 ± 0.48 | 2.25 ± 0.63 |
Gilpinia hercyniae | 2.25 ± 0.48 | 5.00 ± 1.08 | 3.00 ± 0.58 |
Hylastes opacus | 2.75 ± 0.75 | 2.50 ± 0.65 | 1.75 ± 0.48 |
Hylurgops palliatus | 1.75 ± 0.25 | 1.50 ± 0.50 | 1.75 ± 0.48 |
Hylurgus ligniperda | 1.50 ± 0.29 | 2.25 ± 0.63 | 2.50 ± 0.29 |
Matsucoccus matsumurae | 2.50 ± 0.65 | 6.50 ± 0.50 | 2.75 ± 0.48 |
Mindarus abietinus | 2.00 ± 0.41 | 2.00 ± 0.00 | 2.25 ± 0.63 |
Neodiprion sertifer | 2.75 ± 0.85 | 2.50 ± 0.50 | 2.00 ± 0.41 |
Ocnerostoma piniariella | 2.00 ± 0.41 | 1.75 ± 0.25 | 2.25 ± 0.48 |
Phyllobius intrusus | 2.00 ± 0.41 | 2.25 ± 0.63 | 2.25 ± 0.48 |
Physokermes hemicryphus | 2.00 ± 0.41 | 2.00 ± 0.00 | 2.25 ± 0.25 |
Pineus boerneri | 3.25 ± 0.85 | 4.75 ± 0.63 | 2.50 ± 0.65 |
Pristiphora erichsonii | 2.50 ± 0.87 | 5.75 ± 0.85 | 2.00 ± 0.41 |
Rhyacionia buoliana | 3.25 ± 0.48 | 2.50 ± 0.50 | 1.75 ± 0.25 |
Sirex noctilio | 2.50 ± 0.96 | 4.75 ± 0.63 | 2.00 ± 0.41 |
Thera juniperata | 2.00 ± 0.41 | 2.00 ± 0.00 | 2.25 ± 0.48 |
Tomicus piniperda | 3.00 ± 0.71 | 2.75 ± 0.25 | 1.50 ± 0.29 |
The rWG index to assess within-group variation for each species varied from 0.06–1.00, with 85% (35 of 41) of the insects having a rWG ≥ 0.70 and 27% (11 out of 41) having a rWG = 1.00 (Fig.
Insect impact scores assigned by each of the four assessors for each insect. Insects with the most disagreement are at the top of the figure, whereas insects with the most consensus are at the bottom of the figure.
Within-group inter-rater agreement (rWG) values (0–1, with 0 indicating no agreement and 1 indicating perfect agreement) for each mean (± SE) insect impact (1–9, with 1 indicating low impact and 9 indicating high impact insect species) with a trendline shown in red.
The mean self-assessed level of expertise ranged from 1.25 ± 0.25 (novice; no expertise) for European spruce needle miner (Epinotia nanana Treitschke) to 4.75 ± 0.25 (expert; high expertise) for hemlock woolly adelgid (Table
The joint fact-finding discussion on European spruce sawfly and spruce needle aphid allowed the working group to constructively reflect on the variation in insect impact scores and identify potential sources of uncertainty. The joint fact-finding meeting also provided a forum to discuss problems that assessors encountered when assigning impact scores for other insects included in this study. Four common themes emerged from the discussion: ambiguous information, discounted details, observed vs. potential impact, and prior knowledge (Table
Significant, positive correlation between level of expertise (scale of 1–5, from no to high expertise) and level of uncertainty (scale of 1–5, from low to high uncertainty) with bubbles that are proportional to the number of overlapping data points.
Common themes that emerged from the joint fact-finding discussion on variation in non-native, conifer-specialist insect impact scores and reflection on problems that the assessors encountered when making their assessments.
Theme | Description |
---|---|
Ambiguous information | Information in the literature was vague, lacking, incorrect, or unconvincing. Often, very little information was provided on the impacts of generally low impact species. Misinterpretation of the ambiguous information provided in the references may have resulted in an under- or over-estimated impact score. |
Discounted details | The assessor unintentionally overlooked details because s/he did not thoroughly read the provided literature. Alternatively, the assessor may have intentionally disregarded details. |
Observed vs. potential impact | Some references provided understated or overexaggerated impacts not supported by empirical data or observations. The assessor did not find it acceptable to assign a lower or higher impact when the species had rarely achieved that potential. |
Prior knowledge | A more specialized assessor had previous knowledge about the insect. Consequently, s/he had more insight than what was provided in the references and/or disagreed with the content in the references based on personal experiences with the insect. |
For this study, we evaluated the efficacy of a detailed nine-point impact scale (Fig.
We found 11 of the 41 non-native, conifer-specialist insects assessed had perfect agreement among assessors, 24 had a high level of agreement, and only six elicited a low level of agreement. Although the Krippendorff’s alpha indicated a moderate level of consensus, the fact that most insects had a high or perfect level of agreement indicated a generally high consensus among assessors. All insects with low agreement among assessors were scored within or on the margin of the medium impact range, whereas the insects with perfect or high agreement among assessors fell near the extremes of their respective impact range. This pattern indicates that divergence in agreement peaked in insects with a medium impact score, perhaps highlighting the challenges associated with determining impact for species that are neither truly benign (low-impact) nor undeniably catastrophic (high-impact). Our use of standardized information may have contributed to this pattern, as this limited the information assessors used to make their assessment. The initial assessor endeavored to select the most comprehensive and accurate references available, but published information can be vague, inaccurate, or misinterpreted. Although we advised assessors to not use their prior knowledge, some assessors had specialized expertise to use when the literature was deficient, while others disagreed with what was written. The joint fact-finding discussions improved understanding and ultimately led to consensus about these medium-impact species. Following the discussions and reassessment, there was no variability in which impact level (low, medium, or high) all 41 insects should be.
This pattern of highly divergent impact scores may also result from intraspecific variation in impact. For this assessment, we considered a taxonomic definition of impact (
Higher variation among medium impact species highlights the importance of having a robust impact scoring system. Although a few impact assessment scoring systems have multiple levels with detailed descriptions from which to choose (e.g.,
In this study, the overall self-assessed expertise level was low, with most insects eliciting an expertise level below three (moderate expertise). The only species that elicited a moderate-high to high self-assessed expertise (> level 3 on the expertise scale) were high impact species: balsam woolly adelgid (Adelges piceae Ratzeburg), hemlock woolly adelgid, and pine woolly aphid (Pineus boerneri Annand). In a pool of assessors, one would expect to have more assessors with expertise on high- than low-impact insect species because high-impact species generate more research funding and publicity in the academic community (e.g., more peer-reviewed publications) and the general public (e.g., more outreach and awareness efforts) than low-impact species. All three species are high-profile insects with widespread documentation, research, and public reporting, such that even non-specialist scientists may be acquainted enough with these species to rate their expertise level as high. High self-assessed levels of expertise might also be elicited from other high-impact species not included in this study.
Uncertainty is often of concern when assessing impact. It is important for assessors to consider the available information and determine the potential impact that the non-native species has or will have with accuracy and consistency to efficiently allocate resources to management and biosecurity strategies (
Most studies that address expertise and expert opinion also address uncertainty (e.g.,
We observed no associations between the coefficient of variation for impact score and the coefficients of variation for the levels of expertise and uncertainty, as both correlations were non-significant. This suggests that expertise and uncertainty may not influence the interpretation of non-native insect impact. In other words, assessors interpreted the same information and arrived at similar conclusions regardless of specific expertise. This is a good indication that the goal of the HIWG for designing the detailed impact scale was met–the same conclusions would most likely be met regardless of which group member did the assessing. It is worth noting that although assessors varied in their self-reported expertise, all are trained ecologists with experience interpreting ecological literature and may be considered “experts” as defined by
Consensus-building and other participatory techniques are increasingly cited in the environmental impact assessment literature (e.g.,
The first theme, ambiguous information, was a common problem encountered by the initial assessors as they sorted through the provided literature, much of which was vague or lacking. This problem was especially acute for species categorized as low impact, some of which were scored as level one, indicating that the new assessor read no information regarding impact, whereas the initial assessor documented at least minor damage. We determined that many of these errors were due to ambiguous language in the references (e.g.,
The second theme that emerged regarded discounted details. Some of the sources referenced were lengthy and detailed, while others were more anecdotal and lacked sufficient detail for rigorous evaluation. An assessor that does not carefully read a reference in its entirety may overlook important details about impacts or the assessor may disregard some statements altogether. For example, an assessor may discount a specific older source because subsequent controlled experiments failed to replicate it. This source of variation may be alleviated if an assessor expresses concerns to the other expert assessors during discussion.
The third theme that emerged focused on observed versus potential impacts. Some references discussed potential impacts not yet supported by empirical data or observations and the assessor did not find it appropriate to assign a score based solely on this interpretation of potential. Our assessments were based on documented impacts rather than potential for future impacts (e.g., under predicted global climate change scenarios or once new hosts were accessed). Other impact assessment protocols, such as
The final theme focused on variation from prior knowledge. In some cases, an assessor had more insight than provided in the references, but their perception differed little from the reference. In other scenarios, the assessor had experimental results or insight that did not support or failed to replicate the reference information, so they chose to base their score accordingly. Such decisions can contribute variation, whether or not the assessor incorrectly rejects correct information. This scenario highlights the value of strict, standardized guidelines, and consensus-building techniques (
Additional consensus was achieved through our joint fact-finding activity. The open dialogue among assessors facilitated achievement of consensus because assessors were able to critically evaluate ambiguous statements and, since some members of the group had prior knowledge that they used to inform their decisions, provide background knowledge based on experience not documented in the literature.
As written, the protocol and detailed, nine-point impact scale provided by the HIWG has the potential to result in a lack of consensus, particularly with medium-impact insect species. However, we found that adding joint fact-finding can alleviate any potential discrepancies in impact scoring. We demonstrate that consensus among diverse expert assessors can be achieved for invasive species decision-making and management. When empirical data are lacking for specific species, decision-makers may use broad ecological principles (
All of the references used for this impact assessment are archived in the U.S. Geological Survey ScienceBase Catalog (
We thank Jill Baron and the other U.S. Geological Survey Powell Center staff members for their support and encouragement during our time at the USGS Powell Center, as well as Hunter Snyder from Dartmouth College for providing us with literature and direction on methods used to conduct an impact assessment. We greatly appreciate the feedback provided by Jeff Morisette (National Invasive Species Council Secretariat), the Marsico Lab, Tanja McKay, Virginie Rolland (Arkansas State University), and four anonymous reviewers, who helped us improve our manuscript. This project was conducted as a part of the “Predicting the next high-impact insect invasion: Elucidating traits and factors determining the risk of introduced herbivorous insects on North American native plants” working group supported by the John Wesley Powell Center for Analysis and Synthesis and funded by the U.S. Geological Survey (to KAT, TDM, DAH, and PCT, and Cooperative Agreement No. G16AC00065 to PCT). Additional support was provided by the Nebraska Cooperative Fish and Wildlife Research Unit, University of Washington, USDA Forest Service Eastern Forest Environmental Threat Assessment Center (Grant No. 15-JV-11242303-103 to PCT), and National Science Foundation LTER program (to MPA). The Arkansas State University Environmental Science Program and USDA Forest Service Southern Research Station (Grant No. 14-CA-11330129-036) provided additional support for ANS. Informed consent was obtained from all individual participants included in the study. The Arkansas State University Institutional Review Board (IRB) deemed this specific project as IRB exempt under 45 CFR 46.101(b). The Nebraska Cooperative Fish and Wildlife Research Unit is jointly supported by a cooperative agreement between the U.S. Geological Survey, the Nebraska Game and Parks Commission, the University of Nebraska-Lincoln, the U.S. Fish and Wildlife Service, and the Wildlife Management Institute. Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government.