Research Article |
Corresponding author: Rezvan Hatami ( rezvan.hatami@hotmail.com ) Academic editor: Cascade Sorte
© 2022 Rezvan Hatami, Graeme Inglis, Stephen E. Lane, Abraham Growcott, Daniel Kluza, Catherine Lubarsky, Charlotte Jones-Todd, Kimberley Seaward, Andrew P. Robinson.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Hatami R, Inglis G, Lane SE, Growcott A, Kluza D, Lubarsky C, Jones-Todd C, Seaward K, Robinson AP (2022) Modelling the likelihood of entry of marine non-indigenous species from internationally arriving vessels to maritime ports: a case study using New Zealand data. NeoBiota 72: 183-203. https://doi.org/10.3897/neobiota.72.77266
|
The establishment of marine non-indigenous species (NIS) in new locations can degrade environmental, socio-cultural, and economic values. Vessels arriving from international waters are the main pathway for the entry of marine NIS, via exposure due to ballast water discharge (hereafter, ballast discharge) and biofouling. We developed a systematic statistical likelihood-based methodology to investigate port-level marine NIS propagule pressure from ballast discharge and biofouling exposure using a combination of techniques, namely k-Nearest-Neighbour and random forest algorithms. Vessel characteristics and travel patterns were assessed as candidate predictors. For the ballast discharge analysis, the predictors used for model building were vessel type, dead weight tonnage, and the port of first arrival; the predictors used for the biofouling analysis were days since last antifouling paint, mean vessel speed, dead weight tonnage, and hull niche area. Propagule pressure for both pathways was calculated at a voyage, port and annual level, which were used to establish the relative entry score for each port. The model was applied to a case study for New Zealand. Biosecurity New Zealand has commissioned targeted marine surveillance at selected ports since 2002 to enable early detection of newly arrived marine NIS (Marine High-Risk Site Surveillance, MHRSS). The reported methodology was used to compare contemporary entry likelihoods between New Zealand ports. The results suggested that Tauranga now receives the highest volume of discharged ballast water and has the second most biofouling exposure compared to all other New Zealand ports. Auckland was predicted to receive the highest biofouling mass and was ranked tenth for ballast discharge exposure. Lyttelton, Napier, and New Plymouth also had a high relative ranking for these two pathways. The outputs from this study will inform the refinement of the MHRSS programme, facilitating continued early detection and cost-effective management to support New Zealand’s wider marine biosecurity system. More generally, this paper develops an approach for using statistical models to estimate relative likelihoods of entry of marine NIS.
ballast water, biofouling, biosecurity, marine ecosystems, non-indigenous species, vessels
Marine ecosystems are vulnerable to the introduction of non-indigenous species (NIS) through human activities such as international trade and transportation, which have increased greatly in the last fifty years (
The two main pathways associated with the spread of marine NIS via shipping are ballast water discharge (hereafter, ballast discharge) and biofouling exposure (
The use of predictive models can be a powerful tool for helping to understand the dynamics of invasion pathways through the exploration of factors that influence propagule pressure (
Many studies have identified high-risk invasion pathways (
The Marine High-Risk Site Surveillance (MHRSS) Programme was established in 2002 to facilitate the early detection of selected marine NIS that were considered likely to impact on New Zealand’s marine values. Currently, 11 ports and marinas are considered ‘high-risk’ for marine NIS entry and establishment (
The aim of this research is to develop a systematic statistical likelihood-based approach for predicting marine invasions. We investigate whether the ports and marinas currently surveyed as part of the MHRSS programme are most likely the locations where marine NIS will enter New Zealand with internationally arriving vessels via the ballast and biofouling pathways. We then assess whether the allocation of survey effort at each location should be modified based on these findings.
Datasets were provided by Biosecurity New Zealand (BNZ), the government agency responsible for New Zealand’s biosecurity system. The main challenge in predicting the hazard presented by ballast discharge for the current case study is that contemporary ballast discharge records were not available. BNZ collected ballast discharge intention declaration data up until 2008, but risk analysts were concerned that the pattern and volume of transits may have changed substantially, hence the need to connect the historical and contemporary data described below.
The challenge in predicting the hazard due to biofouling exposure is to construct a model that permits prediction of the biofouling level based on easily captured vessel statistics. As described below, we used data from a historical biofouling study to construct this model.
For vessel ballast water and biofouling pathways we used two sets of historical and contemporary data as described below.
A historical dataset of biofouling and ballast water declarations was compiled for vessels that visited New Zealand during 1998 to 2008, which contained records for 15,745 voyages by 2,607 vessels. This dataset contained information on the vessel’s physical and voyage characteristics, and ballast discharge volumes detailed for each ballast tank. This dataset was examined to determine: (i) unique voyage and port visits by vessels and volumes of ballast discharged per vessel at each port, and (ii) management of the ballast and intention to discharge in New Zealand. The response variables used for ballast water modelling were the port of discharge and discharge volume per port. Vessel type, ballast capacity or Deadweight Tonnage (DWT), arrival port, and the intention of discharge were candidate predictor variables for estimating how much ballast water was discharged at each port. Arrival port is the first New Zealand port a vessel encounters in a voyage. Many voyages declared more than one port of ballast discharge, i.e., each journey has a first arrival port, but the vessel might visit other ports as part of the same voyage and discharge ballast. For example, a vessel’s voyage to New Zealand may include visits to Whangarei, Tauranga and Auckland, with intended ballast discharge at Whangarei and Tauranga but not Auckland; here the recorded intended ballast discharge volumes for the first two ports (i.e., those at which ballast was discharged) were tallied. The historical data contained 15,628 unique voyages out of which 1,196 journeys had more than one port of ballast discharge. Out of 17,703 stops at ports of arrival, there were 6,287 incidents of reported ballast discharge. Note that the discharge may have presented an acceptable biosecurity risk, if the discharge were carried out in a regulated way, for example after open-ocean exchange. ‘Intent to discharge’ was not used as a predictor because there was poor agreement between the declared intention to discharge and the reported discharge volume. More information about this agreement can be found in the Suppl. material
Biofouling on the submerged surfaces of 322 international merchant vessels was sampled by
Data on the presence and mass of biofouling from 314 of the merchant vessels and associated data from the questionnaire were used to build the biofouling model. As the contemporary arrival data to which the model would predict contained only limited information on the recent voyage history of the vessels, the models were fitted using variables from the historic study that could be easily matched to contemporary information collected by BNZ. They included design specifications of the vessel (e.g., class type, design speed), vessel age, niche area, the age of antifouling coatings on arrival, and the maximum periods of inactivity (lay-up) in any location. For biofouling data, the main two response variables were biomass of biofouling per square metre and presence/absence of any biofouling. The latter had a value of 1 whenever biofouling was present, and zero if there were no biofouling. This binary variable was used to correct mass; that is, whenever mass had a zero value, but biofouling was present, a correction value of 0.001 g was added to the mass. Predictor variables in the historical biofouling data were the age of antifouling coating, vessel type, vessel age, vessel speed, season, niche, and maximum duration of lay-up. The description of the variables and their ranges are given in the Suppl. material
Contemporary data sourced from BNZ on international vessel arrivals into New Zealand were a combination of two data sets. The first included data on vessel arrivals between January 2015 to December 2017, and was derived from mandatory pre-arrival documentation provided by the vessel master to BNZ (
The ballast discharge model was used to predict the total annual volume of ballast water discharged at each New Zealand port. The contemporary data lacked ballast discharge information, so we had to predict it from historical data. The models were built to predict ballast discharge at each port as a function of arrival port and ship characteristics.
The historical biofouling data from
This section reports construction of an algorithm to predict the volume (and at which ports) a vessel will discharge their ballast water, given their vessel-level characteristics (e.g., type, DWT, Ballast capacity) and journey-level information. The complex aspect of this endeavour is that after entering New Zealand’s waters, each vessel may visit more than one port, with the possibility of discharging ballast water at one or more of the ports visited. As a result of discharging at one or more ports, the variable of interest is structured as a vector for each vessel, so any algorithms investigated are required to take this vector-based outcome into account.
An algorithm based on k-Nearest Neighbours (KNN,
Denoted by X the n × p matrix of data used for model training (the training data is the historical data as described in the Suppl. material
i.e., the predicted discharge values are the average of the kth nearest neighbours discharges.
i.e., B̂t[v] is the total predicted ballast discharge in port v during year t.
The KNN algorithm does not work for categorical predictors, as there is no defined distance metric for mixtures of categorical and numerical predictors. To allow for categorical predictors (such as vessel type), the KNN algorithm was implemented on numerical predictor matrices created within each level of a category. For example, to use vessel type and dead weight tonnage as predictors, the algorithm was run on the matrix X formed from the subset of voyages from each level of vessel type until all predictions were made.
The historical data were structured such that each row of the dataset represented a single event in a vessel’s voyage. For example, one row may consist of the arrival event, with no discharge, and the next row a discharge event for the same vessel at a different port. It was important that if a vessel did not discharge in a port, then this non-event was not recorded. In order to have a complete vector response for each voyage, it was reasonably assumed that non-events resulted in a 0 discharge being recorded. An example of this transformation is provided in Suppl. material
The choice of which predictors to include, and how many nearest neighbours to use in the prediction was made using cross-validation. Given our prediction target of total ballast water volume at a port within a year, our cross-validation strategy was to leave out a single year at a time to form the testing set, with the remaining year’s data forming the training set. A schematic of cross-validation used for KNN analysis is given in Suppl. material
In order to provide an unbiased validation of the chosen model and help understand its operational characteristics, the data were split into two: model-fitting data comprising all voyages within the years 1999 to 2005 (inclusive) and validation data comprising all voyages within the years 2006 and 2007.
To estimate the prediction error from the KNN algorithm, a bootstrap procedure was used. Voyages were randomly sampled with replacement from the training data, and the KNN algorithm fit on each bootstrap sample. The standard deviation (and mean/median) of total discharge within a port was calculated from 100 bootstrap resampled values of total discharge.
Several different modeling approaches were applied to the historical data in the process of model selection to estimate biofouling biomass for the contemporary dataset (Suppl. material
Random forests analysis was applied to the data because the algorithm is indifferent to the conditional distribution of the response variable, meaning that no special provisions need to be taken to allow for the heavy zero inflation. The random forest model explained about 15% of the variation in the biofouling mass, which was better than its alternatives. Quantile random forests was used to obtain journey-specific prediction distributions which could then be aggregated to obtain overall levels of uncertainty for the exposure of each port to biofouling for each year.
The likelihood of entry via biofouling was assumed to be proportional to the biofouling exposure at each port, each year. For a single vessel, this was assumed to be the predicted total mass of biofouling in the niche areas of the vessel (not including the hull as a niche area). However, the contemporary data did not contain biofouling mass measurements for each vessel but did contain measurements for each vessel’s niche area in metres squared. To handle this, two models were constructed: a model to predict the biofouling mass per vessel as a function of several variables, and another model to predict the quantiles of the prediction distribution. The random forest model used the following continuous predictor variables: dry weight tonnage, days since last antifouling paint, niche area, and mean speed. Vessel type was assessed and was excluded from the models because the improvement of fit was insufficient.
To predict the density of biofouling for each vessel in the contemporary data, missing data were first handled using multivariate imputation via chained equations (
We calculated an entry likelihood score from ballast discharge and biofouling propagule pressure on a scale of 0 (lowest) to 1 (highest). That is, the mean values from the estimated prediction distribution were scaled to each of the respective discharge and biofouling amounts by the maximum amount within the ports, which sets the scaled score to 1 for the port with the highest amount. We did not subtract the smallest so that no port was allocated a weight of 0. These scores were then averaged to give the relative entry likelihood score for each port. Weights were then calculated in proportion to these relative scores. The weights sum to 1 across the 15 ports and could, therefore, be used to allocate total surveillance effort among ports relative to the likelihood of entry of marine NIS for each port.
We fitted the KNN model using all combinations of vessel type and arrival port as categorical variables, DWT or date of arrival as the nearest neighbour matrix, and k = 1,3,5 nearest neighbours to the model data set. Results of the cross-validation in Table 5 (Suppl. material
The best model arising from the cross-validation fitting study of the KNN algorithm as selected was used to predict contemporary ballast discharge per port. For this application, the full historical data (1999–2007) were used for training. Fig.
Based on the Random Forest statistics, e.g., Gini impurity index and MSE (mean squared error), important variables for biofouling prediction were days since last antifouling paint, niche area, vessel speed and DWT. Variability in the modelling was accounted for by predicting quantiles of the vessel biomass (Fig.
Predicted vessel biofouling exposure (tonnes) to each arrival port for the contemporary data (2015, 2016, 2017 and combined years). The median (dot) and 10% and 90% quantiles (upper and lower lines) of the prediction distribution are displayed.
As summarised in Table
Predicted biofouling exposure and 10–90% quantile for the ports with the highest average biofouling mass as illustrated in Fig.
Port | Predicted biofouling exposure (tonnes/year) | 10–90% quantile (tonnes/year) |
---|---|---|
Auckland | 70 | 58.4 – 86.02 |
Tauranga | 65.37 | 49.62 – 79.82 |
Lyttelton | 30.83 | 23.11 – 39.1 |
Napier | 26.48 | 18.81 – 36.22 |
Wellington | 25.81 | 15.44 – 37.48 |
Finally, for each arrival port, the biofouling exposure can be compared with the ballast discharge (Fig.
Predicted vessel ballast water discharge (kilotonnes) vs biofouling exposure (tonnes) per port. The figure displays the mean and 10% and 90% quantiles of the bootstrap distribution for ballast water and from the prediction distribution for biofouling. The values are the average of ballast water and biofouling mass over years 2015 – 2017.
Based on the relative likelihood of entry of marine NIS via these two pathways, the highest surveillance weights would be allocated to the ports Tauranga and Auckland, followed by Lyttelton, Napier, New Plymouth, and Wellington, whereas the lowest surveillance effort would be allocated to the ports Gisborne, Timaru, Bluff, and Picton.
The relative entry likelihood weighting, calculated from the mean values of the estimated prediction distributions and scaled for the respective ballast discharge and biofouling exposure, is provided in Table
Relative entry likelihood for each port for ballast water discharge and biofouling exposure, and relative surveillance effort allocation weights.
Port | Total ballast discharge (kilotonnes) | Total biofouling exposure (tonnes) | Relative entry likelihood | Relative allocation weight |
---|---|---|---|---|
Tauranga | 2,205,593 | 64.73 | 1 | 0.23 |
Auckland | 307,807 | 69.53 | 0.59 | 0.13 |
Lyttelton | 904,603 | 30.07 | 0.43 | 0.1 |
Napier | 823,096 | 26.19 | 0.38 | 0.09 |
New Plymouth | 1,042,600 | 9.80 | 0.31 | 0.07 |
Wellington | 288,243 | 25.49 | 0.25 | 0.06 |
Nelson | 534,271 | 16.38 | 0.24 | 0.06 |
Dunedin | 530,440 | 14.45 | 0.23 | 0.05 |
Whangarei | 609,168 | 8.51 | 0.2 | 0.05 |
Taharoa | 842,739 | 0.54 | 0.2 | 0.05 |
Gisborne | 581,713 | 4.03 | 0.16 | 0.04 |
Timaru | 247,965 | 12.82 | 0.15 | 0.04 |
Bluff | 213,933 | 9.05 | 0.11 | 0.03 |
Picton | 208,567 | 2.76 | 0.06 | 0.02 |
We developed a systematic methodology that can be used to adapt surveillance effort over time in response to changes in propagule pressure. The methods can be applied to port systems in other countries or to other invasion pathways. Summarised at the highest level, the study applied model-based estimation to activity data, where the activity data were contemporary transits of marine vessels. In the case of ballast discharge, the model-based estimation was non-parametric, using KNN to identify the vessel transit trajectories that were most similar to the contemporary records on the available subset of fields. In a sense, the contemporary transit data reweighted the historical ballast discharge records, and the significant changes in activity observed in Tauranga were reflected in its enhanced likelihood of marine NIS entry via this pathway. For the biofouling exposure, we predicted vessel-level biofouling using random forests, and its uncertainty using quantile random forests, and computed port exposure directly from the contemporary transit records. Key in each case was the availability of contemporary activity data that could be used to calibrate the relative likelihood of marine NIS entry experienced by each port. In this system as in many others, the likelihood of entry scales smoothly with the exposure.
Based on this study, Tauranga had the highest predicted joint ballast discharge volume and exposure to biofouling. This is despite Auckland receiving the highest number of vessel visits in almost all years (Suppl. material
Higher entry likelihood at Tauranga is likely related to this port having the highest number of discharge declarations, vessel visits and DWT. Tauranga was also visited with the highest number of bulk vessels, which along with containers had the highest discharge counts in the historical data. Considering that bulk vessels had the highest DWT and discharge volume among the vessel types, it is not surprising that their size and discharge volume contributed to their higher likelihood of marine NIS entry in Tauranga. Increase in ballast discharge in Tauranga, which reached its peak in 2016, was accompanied by the highest number of bulk carriers and containers visiting this port in 2016. Predicted ballast discharges were higher in 2016 mainly because the total annual number of vessels visiting New Zealand ports was highest in 2016; all the vessel types except for general cargo visited New Zealand’s ports more in 2016 and had a higher DWT in this year. The total number of visits decreased in 2017, along with the numbers of visits by almost all vessel types including those with high DWT. This can explain the decrease in ballast water predicted values in 2017.
In several studies, ballast discharge volume and frequency have been reported to vary significantly by ship type (
As noted earlier, unregulated ballast discharge is an important pathway for marine NIS spread. Consequently, information about ballast discharge quantities, along with detailed information on ballast discharge mitigation measures such as open-ocean exchange, would be valuable for risk analysis. This information would enable reporting of potentially risky discharge events, which could then be used to inform the ports that were most in need of surveillance. However, such records were not available for the contemporary data, and only intention-to-discharge records were available in the historical data.
This study predicted the amount of discharged ballast water and biofouling mass per port at the voyage level, and the ports where the discharge occurred. The predicted values of discharged ballast water and biofouling mass were used to review New Zealand’s marine biosecurity surveillance programme by estimating the relative likelihood of entry of marine NIS at each port and allocating surveillance effort between ports accordingly. Some studies have aimed to identify hot spots of marine NIS exchange via ballast water (Drake and Lodge, 2004), or possible source ports and countries for the arrival of a specific NIS by analysing international shipping networks and generating pathway simulations to help optimise shipping container inspection protocols (
This study has limitations. We did not incorporate other factors that might mitigate or exacerbate entry likelihood, such as the efficiency of ballast water exchange or treatment in removing marine NIS (
Specific policy outcomes regarding surveillance are beyond the scope of this paper. Nonetheless, the change in the pattern of vessel transits between the early 2000’s and the contemporary data show that the propagule pressure has sharply increased in Tauranga relative to the other ports, and is now similar to that of Auckland. Further, Napier is subject to similar levels of propagule pressure to locations currently included in the MHRSS programme.
We developed a systematic methodology that can be used to adapt surveillance effort over time in response to changing vessel arrival patterns and types, and that can be applied to maritime ports in other countries or to other invasion pathways. Aligning survey effort with marine NIS entry likelihoods will increase the likelihood of early detection and improve management outcomes. The systematic likelihood-based methodology designed here is flexible and can be applied to surveillance programmes at any time if changes to propagule pressure occur.
The authors acknowledge the financial and other support provided by the New Zealand Ministry for Primary Industries, the Australian Department of Agriculture, Water and the Environment, and the University of Melbourne through the Centre of Excellence for Biosecurity Risk Analysis, an initiative of the Australian government.
Supplementary materials
Data type: Docx file.
Explanation note: This is a file that contains the supplementary analyses.