ReSeARCh ARtiCle

Previous studies on alien species establishment in the United States and around the world have drastically improved our understanding of the patterns of species naturalization, biological invasions, and underlying mechanisms. Meanwhile, relevant new data have been added and the data quality has significantly increased along with the consistency of related concepts and terminology that are being developed. Here using new and/or improved data on the native and exotic plant richness and many socioeconomic and physical variables at the state level in the United States, we attempt to test whether previously discovered patterns still hold, particularly how native and exotic species are related and what are the dominant factors controlling the plant naturalization. We found that, while the number of native species is largely controlled by natural factors such as area and temperature, exotic species and exotic fraction are predominantly influenced by social factors such as human population. When domestically introduced species were included, several aspects in earlier findings were somewhat altered and additional insights regarding the mechanisms of naturalization could be achieved. With increased data availability, however, a greater challenge ahead appears to be how many and which variables to include in analyses.


introduction
In the past decades, studies on the species naturalizations and invasions in the United States and around the world have drastically improved our understanding of the patterns and underlying mechanisms (e.g., Lockwood and McKinney 2001, Richardson 2011, Simberloff and Rejmánek 2011).Modern ecology continues to have drastic changes partly because of the increased quantity and quality of data and improved analytical technology.For example, studies relating plant species invasions to other biotic (e.g., animal richness), socio-economic, and physical variables demonstrate a remarkable progression in this regard (e.g., various variables used and data interpretations in Stohlgren et al. 2003, Rejmánek 2003, Stohlgren et al. 2006; see also Espinosa-Garcia et al. 2004, Leprieur et al. 2008, Marini et al. 2009, Pyšek et al. 2010, Albuquerque et al. 2011, Bartomeus et al. 2011, Koch et al. 2011, Williamson et al. 2011).Meanwhile, many newly added variables are continuously found responsible for previously observed patterns and processes.As a result, interpretations and conclusions change, sometimes leading to new insights.
On average, in the 48 conterminous US states, about 25% of naturalized plant species are domestically introduced from other states, which significantly increased the exotic richness but simultaneously decreased the earlier reported native richness in each state (Kartesz 2011).For example, out of 865 exotic plant species in North Carolina, 166 are actually introduced from other states but treated as 'native' species in earlier analyzes (for related statements and consequences, see Rejmánek andRandall 1994, McKinney 2005, Guo 2011, andPyšek 2011).The corrected native and exotic richness data could potentially affect previously revealed relationships and their interpretations (a related issue of data quality and comparability in biological invasions has also been raised by Hulme and Weser 2011).For instance, using a dataset on plant richness in which native and exotic richness were defined using state, rather than national boundaries, Guo and Ricklefs (2010) found that species-area curves (for both natives and exotics) and exotic fraction-area relationships have changed from previously reported results.However, there are several other related aspects that remain unexplored.For example, increased exotic richness and decreased natives richness drastically have increased the exotic fraction (a measure of degree of naturalization or DN) for each state although the corresponding figure for the entire United States does not change.Also, how additional variables (e.g., geographical, social, economical) might be related to the new figures in native vs. exotic richness need to be re-examined.Indeed, when data quantity and quality have been substantially increased with time, it is reasonable and possible to suspect that one may find patterns different from previous studies.
At the state-level, previous studies have examined and found significant effects of native richness, area, latitude, elevation, human population, the time since admission to the Union, and year of publication on the exotic species richness (or exotic fraction) across the United States (e.g., McKinney 2001, Stohlgren et al. 2003, Rejmánek 2003, Guo and Ricklefs 2010).Here, using the dataset provided by Kartesz (2011), we re-examine the effects of several additional variables related to geography (location), biology (native richness), social-economics, and physical features in each of the 48 contiguous US states to determine factors potentially influencing exotic plant naturalization in the United States (Table 1, S1).We also investigate whether and to what degree the variables involved might be spatially correlated and whether it may make a difference in data interpretation in this particular case.table 1. Results from multiple regression analyses showing the relationships between selected land-cover types and the corrected richness of native (a) and exotic (b) vascular plants and the exotic fraction (c) in the 48 conterminous US states (bold-faced P-values highlight the significant relationships).Temperature and precipitation represent mean annual temperature (ºC) and mean annual precipitation (cm), respectively.Here, exotic fraction was angular transformed, native and exotic richness, population size, years in the Union, and the number of ecoregions were log transformed, and the rest (mostly related to area) were square-root (sqrt) transformed before analyses.

Methods
Here we follow the definition of naturalized plant species by Richardson et al. (2000): alien plants that reproduce constantly and sustain populations over many life cycles without direct intervention by humans.Usually, 20 to 60 % of naturalized plant species are invasive species (spreading at considerable distances from parent plants) (Rejmánek 2000a,b, Pyšek et al. 2002).We obtained the exotic and native richness data for plants in each of the 48 conterminous continental US states from Kartesz (2011).This source somewhat overestimates numbers of naturalized species because "exotics" also include some casual, not completely naturalized species.However, this is the best available approximation of the naturalized species numbers.Kartesz, in the second edition of his "Floristic Synthesis" (2011; see also Guo and Ricklefs 2010), defined exotics based on state boundaries (i.e., with domestic introductions among states included).This improved (or corrected) approach of estimating species richness increased the number of exotic species and at the same time reduced the number of native species compared to previously used figures.To assess the degree of naturalization (DN) in each state, we then calculated the exotic fraction as (exotic species/[native + exotic species]).Even though states are not natural units, we focus on the state-level throughout this study so that comparisons can be made with other state-level studies.
To examine the naturalization patterns related to geography (relative locations of each state), we made a simple comparison between border and interior states.The states with large water (i.e., oceans and the Great Lakes) were defined as border states and the rest as interior states.To examine the possible effects of selected social, economical, and physical variables on the naturalization patterns across the 48 conterminous US states, we related the number of native and exotic species and the exotic fraction to the human population, years since joining the Union, climate condition, the area, land cover types (below), and the number of eco-regions of each state (Bailey 1998).We performed multiple regression analysis to identify the effect of the social, economical and physical factors on the native and exotic richness and the exotic fraction across the 48 states.To elucidate the relationship structure among the selected state variables, we also performed a Principal Component Analysis (PCA).The selected variables were either log (e.g., area) or square root (exotic fraction) transformed to yield approximately normal distributions and to linearize relationships (See Table 1).
The climate data (i.e., mean annual temperature, mean annual precipitation) for each state were obtained from http://www.cdc.noaa.gov/data/usclimate/and land cover data from http://www.allcountries.org/uscensus/(1997).For the land cover data for each state, "developed area" includes urban and built-up areas such as highways, roads, cemeteries, airports, golf courses, landfills, small parks and other transportation facilities."Cropland" includes both cultivated and non-cultivated lands such as hay fields and horticultural cropland."Forestland" also includes land stocked by single-stemmed woody species, land of natural regeneration of tree cover, and land not currently developed for non-forest use."Pastureland/Rangeland" includes land managed primarily for the production of introduced forage plants for livestock grazing and grasslands, savannas, many wetlands, some deserts, and tundra with climax or potential vegetation composed principally of native grasses, forbs or short shrubs suitable for grazing and browsing, and introduced forage species that are managed like rangeland species.
To analyze spatial autocorrelation for all selected variables, we calculated geodesic distances using the latitudinal and longitudinal data based on the center-point of each state.To examine the relative contribution of spatial autocorrelation, we applied and compared the results from both the ordinary-least-squares (OLS) estimation and spatial autoregression analyses (SAR).These two and the PCA analyses were performed using SAM (Spatial Analysis in Macroecology) (Rangel et al. 2006), which is freely available at www.ecoevol.ufg.br/sam.

Results
In contrast to previously reported significant relationship between native and exotic species richness estimated based on the US boundary, using the corrected values (i.e., species truly native or exotic to each of the 48 states, rather than to the entire continental US), the relationship became non-significant (Fig. 1).The states with higher foreign exotic richness or fraction also had higher domestic exotic species richness or fraction (r 2 = 0.83, p < 0.0001).Analyses using the improved naturalized (Kartesz's "exotic") and native richness data across the 48 continental US states showed geographic (isolation) effects; i.e., although there was no difference in native richness between the border (coastal) states (with isolation on part of their borders) and interior states, the former had higher exotic richness and fractions than the latter (Fig. 2a).The exotic fraction decreased with state area but the declining rate was significantly higher for the interior states than for the border states (Fig. 2b; t = 3.79, P < 0.001).The top five states in the conterminous continental United States with the highest exotic fractions were all border states with rather small areas such as Massachusetts (84%), New York (71%), Pennsylvania (61%), Connecticut (60%), and Maine (55%) in the Northeast; whereas the ones with the lowest exotic fractions were the ones in the relatively dry and interior areas such as Arizona (13%), Nevada (13%), New Mexico (13%), Wyoming (16%), and Colorado (17%).In addition, our data show that the border states also have higher proportions of domestic exotics (i.e., domestic exotics/ all exotics = 22%) than the interior states (15%; chi-square test, df = 1, P < 0.001).
Native richness was positively related to land area, temperature, human population size, and the number of ecoregions (as a measure of habitat diversity), but negatively related to the area of crop lands.By contrast, exotic species and exotic fraction were predominantly influenced by social factors (i.e., human population size).Exotic richness was also positively related to the number of years since joining the Union, and exotic fraction was also negatively affected by land area (Guo and Ricklefs 2010) and cropland (marginally; see Table 1, S1).Again, not surprisingly, both domestic and foreign exotic plants showed similar relationships with selected biotic, social/economic, and physical factors (not shown).
Results from PCA that extract orthogonal axes depicted a strong correlation structure (collinearity) among the selected state variables for the 48 conterminous continental US states.Several independent variables such as the number of ecoregions, human population size, pasture/rangeland, and urban area were positively related to each other and related to the response variable, native richness, along the first (horizontal) axis.Independent variables such as years in the Union and human population size were also positively related to each other and related to the exotic richness and exotic fraction (Fig. 3).The first component principal accounted for 37% of the total variance and the first two components (out of 13) accounted for 64% of the total variance.All variables except human population size were strongly correlated over space, but at different distances (Fig. S1).The state land area showed positive autocorrelations over the shortest distances and the number of eco-regions showed positive autocorrelations over the smallest distances, with other variables at intermediate distances.Interestingly, the exotic richness (and exotic fraction) exhibited significant positive spatial autocorrelation at a larger distance than native richness, suggesting greater homogenization (or similarity) in terms of species exotic floras across the 48 states than that in native floras (see also Rejmánek 2000b).However, as the distance continue to increase, the native richness, exotic fraction, land area, the number of eco-regions, and precipitation exhibited significant negative autocorrelations at the farthest distance; whereas the forestland, cropland, paster/range, temperature, and exotic richness exhib- There was no significant difference in native richness between the border and interior states of the US but the border states showed significantly higher exotic richness and fraction than the interior states (t -test, P < 0.05).The exotic fraction decreased with state area and the interior states showed a greater decline.Here, natives and exotics were estimated using states own borders (bi-directional bars = SD).

Log area (km 2 )
ited U-shaped spatial autocorrelations, which may indicate a scenario similar to "one big patch" (i.e., the values are all significant and positive at short and large distances but negative at intermediate) proposed by Fortin and Dale (2005) (Fig. S1).
To test the relative strength of spatial autocorrelation, which was measured based on the geodesic distances among the 48 states, we performed ordinary-least-squares (OLS) estimation and spatial autoregression analyses (SAR) that took both predictor variables and space (autocorrelation) into account.We then compared the results through both approaches.AICc values indicated that OLS (ordinary least squares multiple regression analysis) produced the best fitted models for native and exotic species richness and for exotic fraction (Table S1), despite the contributions from spatial autocorrelation in certain variables.

Discussion
In agreement with several previous studies (Rejmánek 2003, Stohlgren et al. 2006), the new results demonstrate the critical aspects of choosing independent variables in drawing conclusions; that is, adding or removing certain variables, due to either data availability or author discretion, can influence results and data interpretation.It is understandable that, in some cases, one or more variables are not analyzed owing to lack of data, although this might lead to biased explanations regarding the mechanisms underlying observed patterns.Indeed, the variables in ecological analyses are often constrained by data availability rather than author discretion.As Rejmánek (2003) showed, when certain variables are added or removed, the conclusions can sometimes change drastically.Increased data availability poses challenges for choosing variables and analytical tools in data analysis.For example, when temperature is considered a potentially important factor, choices must be made between using mean annual temperature, temperature in the warmest/coldest month (or quarter), degree days of temperature above or below certain level, and extreme temperatures.Some of the temperature variables might show significant relationships with the dependent variables while others may not.Similarly, there are many variables associated with human activities (e.g., population size/density, road density, energy consumption) that are interrelated to each other and each may show a different level of responsibility for the observed patterns of biotic invasions (e.g., Lin et al. 2011).
The border states are partly isolated from other interior states therefore should lose accessibility by some domestic exotics, but should have greater accessibility by foreign exotics through proportionally more and larger international airports and sea ports and earlier encounter of foreign sources of propagules (Koch et al. 2011).However, surprisingly, our data show that the border states still have higher proportion of domestic exotics.It remains puzzling how this paradoxical pattern has emerged.It is possible that domestic traffic (travel, trade) among the border or coastal states and from interior states to border states still exceeds traffic among interior states, but further examination of this phenomenon is clearly needed.
While the number of native species is related to both natural and social variables, exotic richness and fraction are predominately influenced by human factors (see also Pyšek et al. 2010).The factors related to native richness are readily interpreted: larger area through the species-area effect; human populations achieve greater density in more productive and heterogeneous areas that also support richer native flora (Mc-Kinney 2001, Rejmánek 2003); warmer, more southern latitudes typically support more species; crop lands diminish the area of native habitat.In contrast, influences on exotic species richness and the exotic fraction are more complex but mostly related to social-economic activities.The positive effect of human population on the number of exotic species and exotic fraction is likely associated with the primary sources and points of introduction in the United States (e.g., Blackburn and Duncan 2001 for birds;Gavier-Pizarro et al. 2010 for plants; Table 1, S1).The negative effect of state land area may be due to reduced pool size of domestic exotics; that is, the larger the state, the smaller outside domestic exotic species pool within the United States (Guo and Ricklefs 2010).Also, in general, smaller states were admitted to the Union earlier, and their history of intensive disturbance and species introduction was therefore longer (Rejmánek 2003).The strong relationship between foreign and domestic exotic richness might indicate that domestic and foreign exotic plants exhibit similar patterns and mechanisms of naturalization across the 48 United States despite the different sources of exotics.
Two major issues deserve attention.First, it would be reasonable to argue that at least some of the differences in previously described patterns of species invasion or naturalization even from the same focal habitat or area stem either from inconsistent definition or inconsistent practice in regarding how to "correctly" count 'exotics'.As Hulme and Weser (2011) point out, a greater challenge ahead is how to ensure data quality and to standardize the data collected from different habitats and regions so that accurate and meaningful comparisons can be made.To date, many large databases have not distinguished between "domestic" and "foreign" exotic species (Guo 2011).Recent moves to increasingly connect disparate databases of variable quality without some consistent quality control may lead to erroneous conclusions (Hulme andWeser 2011, Pyšek 2011).
Second, increased data availability often leads to data dependency over space, time, or both, thus to violation of the assumptions of many statistical tests.It is still not clear whether, and to what extent, spatial or temporal autocorrelation and collinearity contribute to the inconsistency in earlier studies.In our particular case, spatial autoregressions confirm the results from multiple regressions and increase confidence in data interpretation.The OLS and SAR gave consistent results (Table S1), suggesting that the explanatory variables are also spatially autocorrelated (see Fig. S1).Thus, removing any autocorrelation among the explanatory variables would also remove most of the explanatory power of the explanatory variables.Unlike the native or exotic richness and exotic fraction, the residuals of most variables do not exhibit spatial autocorrelations (V.Jarosik, Personal Communications; see also Dormann et al. 2007, Pyšek et al. 2010).Therefore, in agreement with findings by social scientists at the state-level by Wasserman and Stack (1995), spatial autocorrelation does not seem to be a serious problem in our analyses at the state scale.However, the spatial autocorrelations of different variables over varied distance intervals do offer additional details regarding their spatial patterns and could potentially reflect the effects of underlying ecological gradients.
The multiple regressions confirm both positive effects of human population size on exotic species richness and exotic fraction, in the United States.Collinearity seems a greater statistical challenge than spatial autocorrelation.However, neither collinearity nor spatial autocorrelation seem to affect the overall results in this particular case.Nevertheless, knowing how the selected variables are spatially or temporally correlated might be informative, as they could affect the response variable interactively.When strong collinearity is detected, significantly reducing the number of variables would be an easy fix for collinearity but, at the same time, information and insights may be lost when ecological processes are influenced by additional factors than those selected.Further, adding more variables offers potentially more hypotheses and tests, and more detailed interpretations.
In summary, using newly added and improved data provides new insights regarding the plant naturalization mechanisms across the United States.All previously used independent variables at state-level analyses such human population, area, were also found significantly related to native and exotic plant richness.Yet, when additional variables were added, we found more variables that were significantly related to native and exotic richness and exotic fraction.Also, in this particular study at the state level, different statistical methods adopted here produced remarkably similar results regardless spatial correlation.However, a greater challenge ahead is how to properly handle greater numbers of variables with increased data availability, and caution is needed when dealing with data at other spatial scales (e.g., county-level).

Figure 1 .
Figure 1.An example showing how the improved data of native vs. exotic species had altered previously described patterns of species naturalization in the United States.Using corrected values (i.e., species truly native or exotic to each of the 48 states, rather than to the entire continental United States), the relationship between native and exotic species richness became non-significant as indicated by the solid dots and dashed regression line.This result is in direct contrast with the previously reported significant relationship (open circles and solid regression line).

Figure 2 .
Figure 2.An example of geographical effects on the native and exotics richness in the conterminous continental United States.There was no significant difference in native richness between the border and interior states of the US but the border states showed significantly higher exotic richness and fraction than the interior states (t -test, P < 0.05).The exotic fraction decreased with state area and the interior states showed a greater decline.Here, natives and exotics were estimated using states own borders (bi-directional bars = SD).

Figure 3 .
Figure3.Results from Principal Component Analysis (PCA) that extracts orthogonal axes and shows the two-dimensional (PC1 and PC2) correlation structure among the selected state variables for the 48 conterminous continental US states.Here, "temp" represents temperature (ºC) and "precip" represents precipitation (cm).

Figure S1 .
Figure S1.Spatial autocorrelation coefficient (Moran's I) of species richness, exotic fraction, and other state variables (black line) and their residuals (gray lines) across the 48 conterminous continental US states.The data points above and upper or below the lower horizontal lines in each panel indicate significant spatial autocorrelations based on randomization (i.e., P < 0.05), using the Monte Carlo randomized data (distances; 200 replicates).For most variables, residuals do not show spatial autocorrelation (seeDormann et al. 2007).