Data Paper |
Corresponding author: Catherine S. Jarnevich ( jarnevichc@usgs.gov ) Academic editor: Ramiro Bustamante
© 2024 Catherine S. Jarnevich, Peder Engelstad, Demetra Williams, Keana Shadwell, Cameron Reimer, Grace Henderson, Janet S. Prevey, Ian S. Pearse.
This is an open access article distributed under the terms of the CC0 Public Domain Dedication.
Citation:
Jarnevich CS, Engelstad P, Williams D, Shadwell K, Reimer C, Henderson G, Prevey JS, Pearse IS (2024) Predicted occurrence and abundance habitat suitability of invasive plants in the contiguous United States: updates for the INHABIT web tool. NeoBiota 96: 261-278. https://doi.org/10.3897/neobiota.96.134842
|
Invasive plant species have substantial negative ecological and economic impacts. Geographic information on the potential and actual distributions of invasive plants is critical for their effective management. For many regions, numerous sources of predictive geographic information exist for invasive plants, often in the form of outputs from species distribution models (SDMs). The creation of a repository of consistently produced SDMs of regional- or national-scale information predicting the potential distribution of invasive plant species could provide information to managers in the prioritisation of invasive species management. Here, we present a novel set of not only habitat suitability models for occurrence for 259 manager requested invasive plant species in the contiguous United States (USA), but also habitat suitability models for abundance (≥ 5% cover) and high abundance (≥ 25% cover). These data provide an update to the Invasive Species Habitat Tool (INHABIT; gis.usgs.gov/inhabit). This tool contains information on the majority of invasive plant species in the contiguous USA with sufficient location data for model building. INHABIT provides a canonical set of predicted geographic distributions for invasive plants in the contiguous USA that can aid in the search for new populations of invasive plant species and help create watch lists for emerging invaders. As this tool contains information on nearly all of the most problematic invasive plants in the contiguous USA, it helps in prioritising management strategies by showing which plants are already present or abundant in a land management area and which may become present or abundant in the future.
Early detection and rapid response, land management, species distribution modelling, watch list
Invasive species cause considerable damage to ecological and economic systems worldwide (
Predictive habitat mapping, based on species distribution models (SDMs), is the primary tool used to anticipate where invasive plant species will establish and become abundant (
Most SDMs predict the habitats and geographic locations in which an invasive species might establish (i.e. become present). However, managers also want to know where species might become abundant as abundance is correlated with impact (
Effective management of invasive plant species requires the prioritisation of species to control (
In this data paper, we present the first abundance-based suitability models for a large suite of plant species through version 4 of INHABIT and describe the methodology used in its creation. This dataset is the first publicly available resource to provide a large number of predicted habitat suitability maps and management area summary tables for the habitats in which invasive plants may establish and may become abundant. The scope of the tool is invasive terrestrial plant species in the contiguous USA and includes 23% of all introduced vascular plant species with at least 100 georeferenced records and 50% of those with high abundance records (Fig.
Figure
The model fitting and summarisation methodology described below represents elements first described by
We asked people managing invasive plant species within the contiguous USA to contribute to a list of terrestrial non-native plant species to include in version 4 of INHABIT. The resulting list identified 286 non-native species. We obtained species occurrence and abundance data for these selected species from existing aggregated occurrence and agency databases (Suppl. material
We restricted species records and observations based on multiple geographic and data quality criteria. We retained records with observation dates ≥ 1980, listed as “observation” or “specimen only” observation types (GBIF only) and a coordinate uncertainty ≤ 30 m. Additionally, we used the “CoordinateCleaner” package (
We also obtained locations for all vascular plant species included on the U.S. Register of Introduced and Invasive Species (US-RIIS) list ver. 2.0 (
The greatest advance for version 4 of the INHABIT webtool is the inclusion of SDMs that predict species abundance in addition to SDMs that predict occurrence. Through informal discussions with managers related to previous work with occurrence and abundance habitat suitability models (occurrence, abundance as > 10% cover;
Each observation record was classified into high abundance (≥ 25% cover), abundance (≥ 5% - 25% cover) or occurrence (< 5% cover or no abundance information). Where numerical cover data were provided, we assigned abundance categories to those records. When there were numerical bins, we used the minimum cover in the bin as a conservative match to our categories, so that 5–30% would be assigned to the abundance bin. However, some aggregated occurrence databases included qualitative descriptions of abundance, which we manually classified: occurrence = “trace”, “rare”, “sparse”, “single plant”, “spot”, “light”, “low”; abundance = “medium”, “moderate”, “common”, “patch”, “patchy”, “scattered dense patches”; high abundance = “high”, “dense”, “abundant”, “heavy”, “major”, “dense monoculture”, “dominant cover”.
We used a nested set of observation records to fit models. Occurrence suitability models included observation records for a species from all three categories. Abundance suitability models were fitted with occurrence records from both the abundance and high abundance categories (≥ 5% cover). High abundance suitability models only used observations categorised as high abundance (≥ 25% cover).
We spatially thinned species records by reducing observations to a minimum 900 m distance between points using the “geoThin” function in the R package “enmSdmX” (
We required at least 100 spatially thinned observations within the contiguous USA to generate an occurrence model for a species. Any requested species with fewer observations were flagged for future modelling efforts using globally sourced observations and predictor data. For abundance and high abundance models, we required at least 50 spatially thinned observations for each model group and we only considered fitting abundance models for species for which we could fit occurrence models. Through previous INHABIT iterations, we have found it difficult to fit models with less than 50 locations and, unlike occurrence data beyond the USA, we are unaware of global repositories that include abundance information.
In statistical and machine learning communities (
As we did not have absence data, we required background locations to capture the environments available to each species to fit the models. We used two methods to generate background points for occurrence model training data to fit two sets of models for each species to account for sampling biases, a continuous KDE method and a target background approach. The continuous KDE method has been suggested for invasive species in particular because there may be a higher density of observations in a region to which the species has been introduced longer compared to the density of observations in a region where a species has only arrived recently (
We used 52 of the 54 environmental predictors included in INHABIT version 3 (
For each species, we selected predictors based on individual species characteristics including biology and lifeform (e.g. winter annual graminoid) and invaded geographic distribution within the contiguous USA. Predictor sets were consistent between occurrence models (i.e. between the KDE background approach and target background approach) and were kept identical between abundance and high abundance models, except in a few cases where the number of predictors for high abundance were highly restricted based on number of observations. To avoid autocorrelation amongst potential predictors, we assessed the degree of correlation using the Pearson, Spearman and Kendall’s pairwise correlation tests and removing one of any pair with a correlation coefficient > 0.70 (maximum of Pearson, Spearman or Kendall;
Following the methodology in
For each model group, we created continuous spatial predictions of relative habitat suitability across the contiguous USA at ~ 100 m2 spatial resolution, expedited by U.S. Geological Survey high performance computing resources (
Through informal discussions with managers during presentations, managers expressed interest in having categorical maps of suitability along with continuous relative predictions of suitability for the three model groups. They reported an interest in three options as existed on INHABIT version 3, ranging from more inclusive to more restrictive. Thus, we produced three distinct binary versions corresponding to each of the three continuous maps (occurrence, abundance, high abundance) using percentile thresholds of 1%, 5% and 10% to convey a gradient of inclusive (comprehensive) to restrictive (targeted) model output (
Managers can utilise summaries of habitat suitability for management areas in various ways, such as to create watch lists (e.g.
We summarised the categorical maps for management areas (Suppl. material
We also generated an updated dataset containing the modelled species’ observation locations to capture the most recent observations following the same steps outlined in the species data section above. Using these data, we counted the number of occurrence locations within each management area boundary and measured the distance from boundaries to the nearest location when no observations fell within the management area.
We merged the habitat suitability summary information with the count and distance information to provide information for watch list development. Early detection at a local level can be informed by watch lists of doorstep invaders, which we define as species with habitat suitability in the focal area, with no known records within the area and with records within either a 50- or 100-mile (75–150 km) buffer of the area (
Managers requested models of 286 species. Of these 286 species, 254 species had at least 143 observations for a spatial test/train data split and five had at least 100 filtered observations for a model to be fitted with no test split. Twenty-seven species had < 100 filtered occurrences and, therefore, did not have models fit. Additionally, 217 species had at least 50 abundance observations (≥ 5% cover) and 189 had at least 50 high abundance observations (≥ 25% cover).
Overall, models performed well with most CBI values, based on the withheld test data, > 0.75 for both individual algorithms and ensemble models for all three model groups (Fig.
Histograms of the continuous Boyce Index (CBI) for all fit models across algorithms and species calculated for the a training data b withheld test data and for the continuous ensemble models for the c training data and d withheld test data.
The continuous maps display relative habitat suitability at a ~ 100 m2 resolution for the contiguous USA. There is a relative suitability map for each model group, including occurrence, abundance and high abundance (see examples in Fig.
Continuous ensemble map for each of three model groups representing low to high habitat suitability for Ulex europaeus (a–c) and Tamarix chinensis/ramosissima (d–f) including a, d occurrence b, e abundance (≥ 5% cover) and c, f high abundance (≥ 25% cover). Black indicates areas with novel environmental conditions (values for at least one model predictor outside the range of values captured by the training data for model fitting).
The integrated maps illustrated the differences in the thresholds used to generate them and showed differences in patterns between species (Fig.
Composite categorical map from discretising the continuous ensemble maps in Fig.
From the tabular summaries, across all species and management areas with suitability, the mean percentage suitable area ranged from 20% to 46% for occurrence, 11% to 26% for abundance and 7% to 19% for high abundance. See management summary tables in
We created updated models of habitat suitability for occurrence for 220 species, new occurrence models for 39 species and models of abundance habitat suitability for 217 species (Suppl. material
Model applications for management and decision-making relevant to invasive plants are diverse. Regional early detection and rapid response applications allow for newly-introduced or actively expanding invasive plant species to be monitored prior to establishment in a management area. Similarly, an “invaders at the doorstep” approach uses models to develop watch lists that could provide information for early detection efforts for species found nearby, but not yet within a management area. Abundance models can be similarly used for management applications, but can be applied to further refine surveys or control efforts to prioritise those species that may be more impactful in an area. When used in conjunction with spatial data on vegetation and wildlife resources, managers may better identify intersections between areas that support greater invasive plant abundance and areas with particularly vulnerable native communities. Managers may choose to prioritise actions based on model outputs and landscape features, such as when an area that is predicted to support higher abundance of an invasive plant is positioned next to a road or waterway that may further spread propagules.
The authors have declared that no competing interests exist.
No ethical statement was reported.
Funding for this project came from Bipartisan Infrastructure Law: Ecosystem Restoration Activity 6: Invasive Species and contributes this work to the National Early Detection and Rapid Response Framework. Any use of trade, firm or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government.
Conceptualization: All; Data curation: CSJ, PE, DW, KS, CR; Formal analysis: all; Funding acquisition: CSJ, JSP, ISP; Investigation: all, Methodology: all, Project administration: CSJ, PE; Software: PE, DW, KS, CR, GH; Resources: CSJ;, Supervision: CSJ; Validation: CSJ, PE, DW, KS; Visualization: PE, DW, KS, CR; Writing – original draft: all; Writing – review & editing: all.
Catherine S. Jarnevich https://orcid.org/0000-0002-9699-2336
Peder Engelstad https://orcid.org/0000-0002-3681-9216
Demetra Williams https://orcid.org/0000-0002-5171-8640
Keana Shadwell https://orcid.org/0000-0001-6835-425X
Cameron Reimer https://orcid.org/0000-0002-2058-0538
Grace Henderson https://orcid.org/0000-0001-9542-6888
Janet S. Prevey https://orcid.org/0000-0003-2879-6453
Ian S. Pearse https://orcid.org/0000-0001-7098-0495
All of the data that support the findings of this study and the study outputs are available in the main text, Supplementary Information or as a U.S. Geological Survey data release (
Additional information
Data type: docx
Explanation note: supplement 1. Supplementary figures and tables. supplement 2. Field Maps instructions. supplement 3. Data processing R scripts.