Modelling the likelihood of entry of marine non-indigenous species from internationally arriving vessels to maritime ports: a case study using New Zealand data

The establishment of marine non-indigenous species (NIS) in new locations can degrade environmental, socio-cultural, and economic values. Vessels arriving from international waters are the main pathway for the entry of marine NIS, via exposure due to ballast water discharge (hereafter, ballast discharge) and biofouling. We developed a systematic statistical likelihood-based methodology to investigate port-level marine NIS propagule pressure from ballast discharge and biofouling exposure using a combination of techniques, namely k -Nearest-Neighbour and random forest algorithms. Vessel characteristics and travel patterns were assessed as candidate predictors. For the ballast discharge analysis, the predictors used for model building were vessel type, dead weight tonnage, and the port of first arrival; the predictors used for the biofouling analysis were days since last antifouling paint, mean vessel speed, dead weight tonnage, and hull niche area. Propagule pressure for both pathways was calculated at a voyage, port and annual level, which were used to establish the relative entry score for each port. The model was applied to a case study for New Zealand. Biosecurity New Zealand has commissioned targeted marine surveillance at selected ports since 2002 to enable early detection of newly arrived marine NIS (Marine High-Risk Site Surveillance, MHRSS). The reported methodology was used to compare contemporary entry likelihoods between New Zealand ports. The results suggested that Tauranga now receives the highest volume of discharged ballast water and has the second most biofouling exposure compared to all other New Zealand ports. Auckland was predicted to receive the highest biofouling mass and was ranked tenth for ballast discharge exposure. Lyttelton, Napier, and New Plymouth also had a high relative ranking for these two pathways. The outputs from this study will inform the refinement of the MHRSS programme, facilitating continued early detection and cost-effective management to support New Zealand’s wider marine biosecurity system. More generally, this paper develops an approach for using statistical models to estimate relative likelihoods of entry of marine NIS. predicted the amount of discharged ballast water and biofouling mass per port at the voyage level, and the ports where the discharge occurred. The predicted values of discharged ballast water and biofouling mass were used to review New Zealand’s marine biosecurity surveillance programme by estimating the relative likelihood of entry of marine NIS at each port and allocating surveillance effort between ports accordingly. Some studies have to identify hot spots of marine NIS exchange via and Lodge, 2004), or possible source ports and for the arrival of a specific NIS by analysing international shipping networks and generating pathway simulations to help optimise shipping container inspection protocols This study differs from previous studies in that a combination vessel traffic patterns and vessel characteristics was used to identify ports likely to receive marine NIS, whilst including a measure of propagule pressure associated with each vessel arrival. Other studies tried to prioritise locations for track-ing biological invasions, but used less exhaustive analyses and were more descriptive to the utility of the database (e.g., Molnar et al. 2008) or to prioritise the pathways that pose the greatest threat McGee et al. 2006). In comparison, we developed a systematic methodology that can be used to adapt surveillance effort over time in response to changing vessel profiles and can be applied to ports in other countries or The techniques used in this study allowed us to consider multiple discharges during a voyage for ballast water analysis, and to account for biofouling presence and while predicting voyage-level, port-level, and annual propagule pressure.


Introduction
Marine ecosystems are vulnerable to the introduction of non-indigenous species (NIS) through human activities such as international trade and transportation, which have increased greatly in the last fifty years (Hulme 2009). International marine shipping, which contributes to 90% of world trade, is a leading source of introduction of many marine NIS (Hewitt et al. 2009;Hulme 2009; International Maritime Organization [IMO] 2017). Marine NIS have the potential to impact upon environmental, sociocultural, and economic values. For example, in 1991, the outbreak of a virulent strain of cholera infected one million people and claimed thousands of lives and was linked to the discharge of ballast water in Peru (McCarthy and Khambaty 1994;Tauxe et al. 1995;Kolar and Lodge 2001). Therefore, it is essential to describe existing patterns of invasion and forecast future invasions for improving management of marine ecosystems (Drake and Lodge 2004).
The two main pathways associated with the spread of marine NIS via shipping are ballast water discharge (hereafter, ballast discharge) and biofouling exposure . Until recently, ballast water was thought to be the major contributor to marine NIS spread; however, the biofouling pathway has been assessed to be as great, if not a higher threat (Cordell et al. 2009;Bell et al. 2011). In New Zealand, it has been estimated that between 69-90% of established marine NIS are likely to have been introduced via biofouling, with ballast discharge being the second most important pathway (Cranfield et al. 1998;Kospartov et al. 2008). International and national policies have been implemented to minimise the spread of marine NIS through ballast water and biofouling, and New Zealand is currently the only country to have enacted mandatory biofouling regulations (Georgiades et al. 2020).
The use of predictive models can be a powerful tool for helping to understand the dynamics of invasion pathways through the exploration of factors that influence propagule pressure . Variables that influence propagule pressure can be classified as voyage properties (e.g., duration, source region, visited locations, arrival date or season), vessel characteristics (e.g., size, ballast water capacity, niche area type and extent) and behaviours (e.g., compliance with regulations), all of which influence the likelihood of entry for both pathways (McGee et al. 2006;Lacoursière-Roussel et al. 2012;Minton et al. 2015).
Many studies have identified high-risk invasion pathways (Cope et al. 2015), global hotspots of invasions (Drake and Lodge 2004), possible source ports of invasion (Paini and Yemshanov 2012), and high-risk vessels arriving from international ports (Clarke et al. 2017). However, few studies have established how information about vessel traffic patterns and vessel characteristics for ballast water and biofouling pathways can be integrated to identify ports most likely to receive marine NIS. We demonstrate the use of an integrative modelling approach using the data available from New Zealand, to investigate propagule pressure differences among ports.
The Marine High-Risk Site Surveillance (MHRSS) Programme was established in 2002 to facilitate the early detection of selected marine NIS that were considered likely to impact on New Zealand's marine values. Currently, 11 ports and marinas are considered 'high-risk' for marine NIS entry and establishment (Seaward et al. 2015). The original site selection was based on proxy indicators using data from 1999, e.g., vessel visits and port habitat types (see Inglis 2001a, b, Inglis et al. 2006. Since the inception of this programme, changes in vessel movement patterns and behaviours are likely, and re-evaluation is appropriate. At the time of the study, the allocation of surveillance effort was identical to all ports except at Auckland, at which it was doubled.
The aim of this research is to develop a systematic statistical likelihood-based approach for predicting marine invasions. We investigate whether the ports and marinas currently surveyed as part of the MHRSS programme are most likely the locations where marine NIS will enter New Zealand with internationally arriving vessels via the ballast and biofouling pathways. We then assess whether the allocation of survey effort at each location should be modified based on these findings.

Methods
Datasets were provided by Biosecurity New Zealand (BNZ), the government agency responsible for New Zealand's biosecurity system. The main challenge in predicting the hazard presented by ballast discharge for the current case study is that contemporary ballast discharge records were not available. BNZ collected ballast discharge intention declaration data up until 2008, but risk analysts were concerned that the pattern and volume of transits may have changed substantially, hence the need to connect the historical and contemporary data described below.
The challenge in predicting the hazard due to biofouling exposure is to construct a model that permits prediction of the biofouling level based on easily captured vessel statistics. As described below, we used data from a historical biofouling study to construct this model.
For vessel ballast water and biofouling pathways we used two sets of historical and contemporary data as described below.

Ballast discharge
A historical dataset of biofouling and ballast water declarations was compiled for vessels that visited New Zealand during 1998 to 2008, which contained records for 15,745 voyages by 2,607 vessels. This dataset contained information on the vessel's physical and voyage characteristics, and ballast discharge volumes detailed for each ballast tank. This dataset was examined to determine: (i) unique voyage and port visits by vessels and volumes of ballast discharged per vessel at each port, and (ii) management of the ballast and intention to discharge in New Zealand. The response variables used for ballast water modelling were the port of discharge and discharge volume per port. Vessel type, ballast capacity or Deadweight Tonnage (DWT), arrival port, and the intention of discharge were candidate predictor variables for estimating how much ballast water was discharged at each port. Arrival port is the first New Zealand port a vessel encounters in a voyage. Many voyages declared more than one port of ballast discharge, i.e., each journey has a first arrival port, but the vessel might visit other ports as part of the same voyage and discharge ballast. For example, a vessel's voyage to New Zealand may include visits to Whangarei, Tauranga and Auckland, with intended ballast discharge at Whangarei and Tauranga but not Auckland; here the recorded intended ballast discharge volumes for the first two ports (i.e., those at which ballast was discharged) were tallied. The historical data contained 15,628 unique voyages out of which 1,196 journeys had more than one port of ballast discharge. Out of 17,703 stops at ports of arrival, there were 6,287 incidents of reported ballast discharge. Note that the discharge may have presented an acceptable biosecurity risk, if the discharge were carried out in a regulated way, for example after open-ocean exchange. 'Intent to discharge' was not used as a predictor because there was poor agreement between the declared intention to discharge and the reported discharge volume. More information about this agreement can be found in the Suppl. material 1: Table S1 in the SM summarises the description of the variables used in the historical dataset for the ballast water analysis.

Biofouling
Biofouling on the submerged surfaces of 322 international merchant vessels was sampled by Inglis et al. (2010) upon arrival to New Zealand. This included 166 container and general cargo vessels, 49 passenger vessels, 37 roll-on roll-off (RoRo) / car carriers, 31 bulk carriers, 21refrigerated cargo ships, 12 tankers and 6 vessels of other miscellaneous class types. A standardised sampling design was used to collect samples of biofouling from the hull and external niche area surfaces of the vessels. A questionnaire administered to the vessel's master provided information on the design features of the vessel, hull maintenance record (e.g., age and type of antifouling coatings), and recent voyage history.
Data on the presence and mass of biofouling from 314 of the merchant vessels and associated data from the questionnaire were used to build the biofouling model. As the contemporary arrival data to which the model would predict contained only limited information on the recent voyage history of the vessels, the models were fitted using variables from the historic study that could be easily matched to contemporary information collected by BNZ. They included design specifications of the vessel (e.g., class type, design speed), vessel age, niche area, the age of antifouling coatings on arrival, and the maximum periods of inactivity (lay-up) in any location. For biofouling data, the main two response variables were biomass of biofouling per square metre and presence/absence of any biofouling. The latter had a value of 1 whenever biofouling was present, and zero if there were no biofouling. This binary variable was used to correct mass; that is, whenever mass had a zero value, but biofouling was present, a correction value of 0.001 g was added to the mass. Predictor variables in the historical biofouling data were the age of antifouling coating, vessel type, vessel age, vessel speed, season, niche, and maximum duration of lay-up. The description of the variables and their ranges are given in the Suppl. material 1: Table S2.

Data on contemporary shipping
Contemporary data sourced from BNZ on international vessel arrivals into New Zealand were a combination of two data sets. The first included data on vessel arrivals between January 2015 to December 2017, and was derived from mandatory pre-arrival documentation provided by the vessel master to BNZ (Ministry for Primary Industries 2019). These data were originally collected to aid in the tactical evaluation of biosecurity risks. The second data set included port-by-port arrivals by merchant vessels, including the ports of first arrival to New Zealand and coastwise voyages for the year 2016. These data were purchased from the Lloyd's Maritime Intelligence Unit, and used to verify the completeness of the BNZ data. Details of all data preparation for the analysis are provided in the Suppl. material 1 -part 1. The same contemporary data were used for both ballast water and biofouling analysis, but a different set of predictors were selected for each.

Model construction
The ballast discharge model was used to predict the total annual volume of ballast water discharged at each New Zealand port. The contemporary data lacked ballast discharge information, so we had to predict it from historical data. The models were built to predict ballast discharge at each port as a function of arrival port and ship characteristics.
The historical biofouling data from Inglis et al. (2010) were used to fit a model that predicted the presence and biomass of biofouling on vessels that arrived in New Zealand between 2015 and 2017. The total mass of biofouling on a vessel was estimated as the product of its estimated mass per unit niche area (i.e., density) and the total niche area for the vessel. A model was constructed from historical data to predict the presence and per unit area of biofouling in niche areas of the vessels as a function of vessel characteristics such as the age of antifouling coatings on the vessel, and the frequency and duration of lay-up periods. As most biofouling occurs in niche areas (Inglis et al. 2010;Davidson et al. 2016), the total mass of biofouling on a vessel was estimated by calculating the total area of niches on each vessel using relationships developed by Moser et al. (2016) and scaling the predicted density for the entire vessel hull area. This model was then applied to the contemporary vessel data to predict port-level exposure to biofouling. Additional details about the modelling approach can be found in the Suppl. material 1 -part 3.

Ballast water
This section reports construction of an algorithm to predict the volume (and at which ports) a vessel will discharge their ballast water, given their vessel-level characteristics (e.g., type, DWT, Ballast capacity) and journey-level information. The complex aspect of this endeavour is that after entering New Zealand's waters, each vessel may visit more than one port, with the possibility of discharging ballast water at one or more of the ports visited. As a result of discharging at one or more ports, the variable of interest is structured as a vector for each vessel, so any algorithms investigated are required to take this vector-based outcome into account.

k-Nearest-Neighbours
An algorithm based on k-Nearest Neighbours (KNN, Fix and Hodges 1951) was developed to simultaneously predict which ports a vessel may visit and the subsequent discharge amount. For each journey in the contemporary data, this algorithm finds the k most similar journeys in the historical data based on some measure of similarity and uses the average ballast discharge from those journeys as a prediction of the discharge for the contemporary journey.
Denoted by X the n × p matrix of data used for model training (the training data is the historical data as described in the Suppl. material 1 -part 1), where n is the number of voyages (observations) and p is the number of fields (predictors) in the data; let Y be the corresponding n × v matrix of discharge volumes, where v is the number of ports. Let z j be a vector of length p, containing the predictors for the j th voyage in the contemporary data, for which we wish to predict ballast discharge,ŷj, at each of the ports. KNN algorithm as applied to ballast discharge data can be explained as below: 1. Repeat for j = 1,…,m, where m is the number of voyages in the contemporary data for which we wish to predict: a) Calculate D(z j , X), the distance from z j to each of the rows xi, i = 1, …, n in the training matrix X.
b) Find S, the set of the k th smallest distance in D (if there are ties, keep these in the set).
i.e., the predicted discharge values are the average of the k th nearest neighbours discharges.
2. For each year t in the contemporary data: a) Find S t [v] , the set of all voyages arriving at port B̂t [v] is the total predicted ballast discharge in port v during year t. The KNN algorithm does not work for categorical predictors, as there is no defined distance metric for mixtures of categorical and numerical predictors. To allow for categorical predictors (such as vessel type), the KNN algorithm was implemented on numerical predictor matrices created within each level of a category. For example, to use vessel type and dead weight tonnage as predictors, the algorithm was run on the matrix X formed from the subset of voyages from each level of vessel type until all predictions were made.
The historical data were structured such that each row of the dataset represented a single event in a vessel's voyage. For example, one row may consist of the arrival event, with no discharge, and the next row a discharge event for the same vessel at a different port. It was important that if a vessel did not discharge in a port, then this non-event was not recorded. In order to have a complete vector response for each voyage, it was reasonably assumed that non-events resulted in a 0 discharge being recorded. An example of this transformation is provided in Suppl. material 1: Fig. S2 in the Suppl. material 1 -part 4.

Choice of predictors and number of nearest neighbours, k
The choice of which predictors to include, and how many nearest neighbours to use in the prediction was made using cross-validation. Given our prediction target of total ballast water volume at a port within a year, our cross-validation strategy was to leave out a single year at a time to form the testing set, with the remaining year's data forming the training set. A schematic of cross-validation used for KNN analysis is given in Suppl. material 1: Fig. S3 in the Suppl. material 1 -part 4.

Model validation
In order to provide an unbiased validation of the chosen model and help understand its operational characteristics, the data were split into two: model-fitting data comprising all voyages within the years 1999 to 2005 (inclusive) and validation data comprising all voyages within the years 2006 and 2007.

Prediction errors
To estimate the prediction error from the KNN algorithm, a bootstrap procedure was used. Voyages were randomly sampled with replacement from the training data, and the KNN algorithm fit on each bootstrap sample. The standard deviation (and mean/ median) of total discharge within a port was calculated from 100 bootstrap resampled values of total discharge.

Model selection
Several different modeling approaches were applied to the historical data in the process of model selection to estimate biofouling biomass for the contemporary dataset (Suppl. material 1 -part 6). Random forests provided the best model to obtain the prediction (Breiman 2001;Liaw and Wiener 2002;see Hastie et al. 2017 for a discussion) and we used quantile-regression random forests to obtain the distribution to represent the prediction uncertainty (Meinshausen 2017).
Random forests analysis was applied to the data because the algorithm is indifferent to the conditional distribution of the response variable, meaning that no special provisions need to be taken to allow for the heavy zero inflation. The random forest model explained about 15% of the variation in the biofouling mass, which was better than its alternatives. Quantile random forests was used to obtain journey-specific prediction distributions which could then be aggregated to obtain overall levels of uncertainty for the exposure of each port to biofouling for each year.

Model building for biofouling biomass
The likelihood of entry via biofouling was assumed to be proportional to the biofouling exposure at each port, each year. For a single vessel, this was assumed to be the predicted total mass of biofouling in the niche areas of the vessel (not including the hull as a niche area). However, the contemporary data did not contain biofouling mass measurements for each vessel but did contain measurements for each vessel's niche area in metres squared. To handle this, two models were constructed: a model to predict the biofouling mass per vessel as a function of several variables, and another model to predict the quantiles of the prediction distribution. The random forest model used the following continuous predictor variables: dry weight tonnage, days since last antifouling paint, niche area, and mean speed. Vessel type was assessed and was excluded from the models because the improvement of fit was insufficient.
To predict the density of biofouling for each vessel in the contemporary data, missing data were first handled using multivariate imputation via chained equations (Buuren and Groothuis-Oudshoorn 2010) to make five complete variant datasets, which were analysed in parallel. The random forest biofouling presence model was applied to each contemporary vessel movement, which gives an estimate of the quantum of biofouling. Then the prediction distributions were estimated using the quantile random forest. Once the density of biomass present on each vessel was predicted, it was multiplied by the niche area of the vessel to give a total niche biomass prediction, which was then used in the allocation procedure.

Ranking ports by likelihood of marine NIS entry
We calculated an entry likelihood score from ballast discharge and biofouling propagule pressure on a scale of 0 (lowest) to 1 (highest). That is, the mean values from the estimated prediction distribution were scaled to each of the respective discharge and biofouling amounts by the maximum amount within the ports, which sets the scaled score to 1 for the port with the highest amount. We did not subtract the smallest so that no port was allocated a weight of 0. These scores were then averaged to give the relative entry likelihood score for each port. Weights were then calculated in proportion to these relative scores. The weights sum to 1 across the 15 ports and could, therefore, be used to allocate total surveillance effort among ports relative to the likelihood of entry of marine NIS for each port.

Ballast discharge
We fitted the KNN model using all combinations of vessel type and arrival port as categorical variables, DWT or date of arrival as the nearest neighbour matrix, and k = 1,3,5 nearest neighbours to the model data set. Results of the cross-validation in Table 5 (Suppl. material 1 -part 7) showed that the KNN that grouped by both port of arrival and vessel type included DWT for calculating nearest neighbours, and using five nearest neighbours performed the best, i.e., this combination of distance variables had the smallest RMSE value (88732.8 for k = 5). A dataset of years 2006-2007 was used to test KNN validation by comparing the mean of predicted ballast water values (as predicted by the bootstrap) with the true values (Suppl. material 1: Fig. S18 in Suppl. material 1 -part 8). Based on the results of the KNN validation summarised in this Fig., the KNN models performed reasonably well for prediction; the observations follow the 1:1 line and double standard deviation lines for the port-level discharge mostly cross the line.

Predicting contemporary ballast discharge
The best model arising from the cross-validation fitting study of the KNN algorithm as selected was used to predict contemporary ballast discharge per port. For this application, the full historical data (1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007) were used for training. Fig. 1 shows the mean and 10% and 90% quantiles as estimated by bootstrapping. Tauranga, with an average of 2255.73 ± 292.86 kilotons, had the highest predicted average ballast discharge across the years 2015-2017, followed by New Plymouth (1025.22 ± 231.89), Lyttelton (907.44 ± 129.39), Taharoa (831.84 ± 242.45), Napier (812.54 ± 191.47), Whangarei (624.92 ± 165.55), and Gisborne (575.33 ± 160.74). For a port receiving a high number of vessels, Auckland had a low volume of discharge by comparison. Following an increase in 2016, the predicted ballast discharge decreased in 2017 in almost all ports (Fig. 1). The lowest predicted values for ballast discharge were observed in Milford Sound, Picton and Bluff, and zero for Westport.

Biofouling
Based on the Random Forest statistics, e.g., Gini impurity index and MSE (mean squared error), important variables for biofouling prediction were days since last antifouling paint, niche area, vessel speed and DWT. Variability in the modelling was accounted for by predicting quantiles of the vessel biomass (Fig. 2).
As summarised in Table 1, Auckland had the highest average predicted biofouling exposure, followed by Tauranga, Lyttelton, Napier, and Wellington. Ports with very low biofouling exposure were Whangarei, Gisborne, Picton, and Taharoa. Fig. 3 shows the variation of the actual biofouling and the prediction distribution, which demonstrates that the model is not of particularly good quality. That is, some of the data points, particularly for higher values of biofouling mass, are well below the 1:1 line, indicating an underestimation of higher values, which reflects regression to the mean. Ballast water and biofouling exposure per port Finally, for each arrival port, the biofouling exposure can be compared with the ballast discharge (Fig. 4). This shows that the arrival ports of Auckland and Tauranga have the highest exposure for biofouling, and Tauranga has the highest volume of ballast discharge (i.e., highest entry likelihoods). The next cluster of ports comprises New Plymouth, which had the second highest ballast discharge but received much lower biofouling exposure than many other ports; Wellington, which had low volume of ballast discharge and high exposure to biofouling; and Lyttelton and Napier, which seemed to have relatively high levels of exposure to both ballast discharge and biofouling. The balance of ports have relatively low exposure.
Based on the relative likelihood of entry of marine NIS via these two pathways, the highest surveillance weights would be allocated to the ports Tauranga and    Auckland, followed by Lyttelton, Napier, New Plymouth, and Wellington, whereas the lowest surveillance effort would be allocated to the ports Gisborne, Timaru, Bluff, and Picton.

Priority ports for marine surveillance
The relative entry likelihood weighting, calculated from the mean values of the estimated prediction distributions and scaled for the respective ballast discharge and biofouling exposure, is provided in Table 2. Based on these results, the ports with higher likelihood of marine NIS entry were Tauranga, Auckland, Lyttelton, Napier, and New Plymouth with scores of 1, 0.59, 0.43, 0.38, and 0.31, respectively. Tauranga was assigned a relative allocation weight of 0.23, followed by Auckland (0.13), Lyttelton (0.1), and Napier (0.09). New Plymouth, Wellington, Nelson, Dunedin, Whangarei, and Taharoa were ranked next based on their surveillance allocation weights. Gisborne, Timaru, Bluff, and Picton had the lowest relative weighted scores. There was a positive relationship between the number of vessel arrivals per port and the entry likelihood scores (r 2 = 0.75). Ports where the entry likelihood is greater than might be predicted by numbers of arrivals alone include Tauranga, Napier, New Plymouth and Taharoa, where there are relatively large proportions of bulk carriers or tankers. Ports where entry likelihood seems lower include Auckland and Wellington. To investigate the variability of the ranking of sites, this allocation was also performed to each of the bootstrap replicates. Fig. 5 shows the histogram of ranks from this procedure.

Discussion
We developed a systematic methodology that can be used to adapt surveillance effort over time in response to changes in propagule pressure. The methods can be applied to port systems in other countries or to other invasion pathways. Summarised at the highest level, the study applied model-based estimation to activity data, where the activity data were contemporary transits of marine vessels. In the case of ballast discharge, the model-based estimation was non-parametric, using KNN to identify the vessel transit trajectories that were most similar to the contemporary records on the available subset of fields. In a sense, the contemporary transit data reweighted the historical ballast discharge records, and the significant changes in activity observed in Tauranga were reflected in its enhanced likelihood of marine NIS entry via this pathway. For the biofouling exposure, we predicted vessel-level biofouling using random forests, and its uncertainty using quantile random forests, and computed port exposure directly from the contemporary transit records. Key in each case was the availability of contemporary activity data that could be used to calibrate the relative likelihood of marine NIS entry experienced by each port. In this system as in many others, the likelihood of entry scales smoothly with the exposure. Based on this study, Tauranga had the highest predicted joint ballast discharge volume and exposure to biofouling. This is despite Auckland receiving the highest number of vessel visits in almost all years (Suppl. material 1: Fig. S5). It has previously been shown that total number of visits alone is not a reliable predictor of propagule pressure , and our results reinforce this outcome. Although the busiest ports -such as Auckland, Tauranga, and Lyttelton -had the highest relative exposure to ballast discharge and biofouling, other highly visited ports such as Wellington and Nelson had lower levels when compared to less visited ports such as Napier and New Plymouth. Several studies have relied on vessel visits as a proxy for propagule pressure in ballast water (Drake and Lodge 2004) and this might be a source of error due to significant differences between vessel types and their characterises and operational profiles (Minton et al. 2005;Verling et al. 2005). For example, Carney et al. (2017) reported that the increase in annual ballast discharge was not because of an increase in total vessel arrival numbers, but instead resulted from an increase in bulk carrier traffic. Thus, it seems that considering vessel type, size and discharge capacity will provide a better predicator of ballast discharge volume for the vessel.
Higher entry likelihood at Tauranga is likely related to this port having the highest number of discharge declarations, vessel visits and DWT. Tauranga was also visited with the highest number of bulk vessels, which along with containers had the highest discharge counts in the historical data. Considering that bulk vessels had the highest DWT and discharge volume among the vessel types, it is not surprising that their size and discharge volume contributed to their higher likelihood of marine NIS entry in Tauranga. Increase in ballast discharge in Tauranga, which reached its peak in 2016, was accompanied by the highest number of bulk carriers and containers visiting this port in 2016. Predicted ballast discharges were higher in 2016 mainly because the total annual number of vessels visiting New Zealand ports was highest in 2016; all the vessel types except for general cargo visited New Zealand's ports more in 2016 and had a higher DWT in this year. The total number of visits decreased in 2017, along with the numbers of visits by almost all vessel types including those with high DWT. This can explain the decrease in ballast water predicted values in 2017.
In several studies, ballast discharge volume and frequency have been reported to vary significantly by ship type Minton et al. 2015), with the largest volumes of ballast water from bulkers and tankers (McGee et al. 2006;Cordell et al. 2009;Carney et al. 2017). Vessel type was consistently suggested as a reliable predictor of biofouling, as different vessels vary in the size and number of niche areas (Lacoursière-Roussel et al. 2012;Lane et al. 2018). Based on the results of cross-validation and model fit statistics, vessel type was not included as a variable in our models, but the variation may have been more efficiently represented by the included predictors, e.g., DWT. In this study, niche area and days since antifouling paint were the two most important variables explaining the variation of biofouling mass. Time since last cleaning and last antifouling paint also played a role in the increased likelihood of entry of marine NIS from biofouling (Floerl and Inglis 2005;Drake and Lodge 2007;Lacoursière-Roussel et al. 2012). Biofouling mass on the vessel might also vary depending on the other factors such as vessel speed as lower speeds and inactivity appear to give the fouling communities opportunity to settle (Minchin and Gollasch 2003).
As noted earlier, unregulated ballast discharge is an important pathway for marine NIS spread. Consequently, information about ballast discharge quantities, along with detailed information on ballast discharge mitigation measures such as open-ocean ex-change, would be valuable for risk analysis. This information would enable reporting of potentially risky discharge events, which could then be used to inform the ports that were most in need of surveillance. However, such records were not available for the contemporary data, and only intention-to-discharge records were available in the historical data.
This study predicted the amount of discharged ballast water and biofouling mass per port at the voyage level, and the ports where the discharge occurred. The predicted values of discharged ballast water and biofouling mass were used to review New Zealand's marine biosecurity surveillance programme by estimating the relative likelihood of entry of marine NIS at each port and allocating surveillance effort between ports accordingly. Some studies have aimed to identify hot spots of marine NIS exchange via ballast water (Drake and Lodge, 2004), or possible source ports and countries for the arrival of a specific NIS by analysing international shipping networks and generating pathway simulations to help optimise shipping container inspection protocols (Paini and Yemshanov 2012). This study differs from previous studies in that a combination of information about vessel traffic patterns and vessel characteristics was used to identify ports likely to receive marine NIS, whilst including a measure of propagule pressure associated with each vessel arrival. Other studies tried to prioritise locations for tracking biological invasions, but used less exhaustive analyses and were more descriptive to illustrate the utility of the database (e.g., Molnar et al. 2008) or to prioritise the pathways that pose the greatest threat (e.g., McGee et al. 2006). In comparison, we developed a systematic methodology that can be used to adapt surveillance effort over time in response to changing vessel profiles and can be applied to ports in other countries or globally. The techniques used in this study allowed us to consider multiple discharges during a voyage for ballast water analysis, and to account for biofouling presence and mass while predicting voyage-level, port-level, and annual propagule pressure.
This study has limitations. We did not incorporate other factors that might mitigate or exacerbate entry likelihood, such as the efficiency of ballast water exchange or treatment in removing marine NIS (Minton et al. 2005), or the influence of transit time on ballast water organism survival prior to discharge (Cordell et al. 2009). Our analysis gave equal weighting to the importance of ballast water and biofouling as pathways for the introduction of marine NIS. Although numerous studies have attempted to summarise the relative importance of these and other pathways for the introduction of marine NIS based on records of established species, such studies typically include many historical (e.g., pre-1950) introduction records, hence the numbers of marine NIS attributed to each pathway may not reflect contemporary patterns of transport. Moreover, many marine NIS can be transported within ballast water and as biofouling, making it difficult to distinguish the relative contribution of each pathway (Fofonoff et al. 2003). Drake and Lodge (2007) estimated the abundance of fouling organisms on a typical merchant vessel to be between 5.85 × 10 5 and 6.19 × 10 6 individuals, which is within the range of abundance of organisms contained in untreated ballast discharges (2.63 × 10 5 -9.83 × 10 6 individuals; Minton et al. (2005)). While these gross metrics of per vessel propagule pressure are comparable, it is unclear whether organisms are equally likely to be introduced, or if introductions by each pathway have the same likelihood of survival and establishment. In the absence of this information the standardised pathway weightings seem appropriate. In aligning surveillance with contemporary patterns of international shipping it is also important to consider other influences on the allocation of survey effort. For example, our analysis did not include port arrivals by small ("recreational") craft, which also carry burdens of biofouling (Floerl et al. 2008;Lane et al. 2018). More than 600 small marine craft enter New Zealand waters each year, with around two thirds of these clearing Customs in Opua. Other significant places of first arrival for small craft are Auckland, Whangarei and Picton which, with Opua, account for more than 90% of annual arrivals. We also did not consider the relationship between surveillance effort and detection probability at the scale of the port, which can be influenced by the size and dispersion of habitats suitable for establishment by marine NIS. For example, the total area of Waitematā Harbour, Auckland, is at least 3 × that of Nelson Harbour and contains the Port of Auckland and 11 marinas for small craft which are distributed from the mouth to the head of the harbour (Morrisey et al. 2012). Gaining equitable survey sensitivity across the surveyed ports requires a proportionately larger sample effort in larger harbours.
Specific policy outcomes regarding surveillance are beyond the scope of this paper. Nonetheless, the change in the pattern of vessel transits between the early 2000's and the contemporary data show that the propagule pressure has sharply increased in Tauranga relative to the other ports, and is now similar to that of Auckland. Further, Napier is subject to similar levels of propagule pressure to locations currently included in the MHRSS programme.

Conclusions
We developed a systematic methodology that can be used to adapt surveillance effort over time in response to changing vessel arrival patterns and types, and that can be applied to maritime ports in other countries or to other invasion pathways. Aligning survey effort with marine NIS entry likelihoods will increase the likelihood of early detection and improve management outcomes. The systematic likelihood-based methodology designed here is flexible and can be applied to surveillance programmes at any time if changes to propagule pressure occur.