Uncertainty and sensitivity analysis in quantitative pest risk assessments; practical rules for risk assessors

Quantitative models have several advantages compared to qualitative methods for pest risk assessments (PRA). Quantitative models do not require the definition of categorical ratings and can be used to compute numerical probabilities of entry and establishment, and to quantify spread and impact. These models are powerful tools, but they include several sources of uncertainty that need to be taken into account by risk assessors and communicated to decision makers. Uncertainty analysis (UA) and sensitivity analysis (SA) are useful for analyzing uncertainty in models used in PRA, and are becoming more popular. However, these techniques should be applied with caution because several factors may influence their results. In this paper, a brief overview of methods of UA and SA are given. As well, a series of practical rules are defined that can be followed by risk assessors to improve the reliability of UA and SA results. These rules are illustrated in a case study based on the infection model of Magarey et al. (2005) where the results of UA and SA are shown to be highly dependent on the assumptions made on the probability distribution of the model inputs.


introduction
Different types of mathematical models are commonly used for pest risk analysis.Some models are used for calculating probability of entry (e.g., Roberts et al. 1998).Others are used to estimate pest establishment potential (e.g., Dupin et al. 2011;Phillips et al. 2006, Roura-Pascual et al. 2009;Sutherst 2003;Webber et al. 2011;Young et al. 1999).Models are also used to model spread (e.g.Pitt et al. 2010, Robinet et al. 2012) or pest impacts under different scenarios (e.g., Stansbury et al. 2002, Cook et al. 2012;Kriticos et al. 2013).These models are powerful tools, but they include several sources of uncertainty that need to be taken into account by risk assessors and communicated to decision makers, namely uncertainty associated with input variables, parameter values estimated from expert knowledge, parameter values estimated from data, and equations, for example, uncertainty about the best equation to use for a given model application.
Uncertainty and sensitivity analysis are two techniques for evaluating models.Although both techniques are often mixed together, they each have a different purpose.Uncertainty analysis (UA) comprises a quantitative evaluation of uncertainty in model components, such as the input variables and parameters for a given situation, to determine an uncertainty distribution for each output variable rather than a single value (Monod et al. 2006;Vose 2000, de Rocquigny et al. 2008).Uncertainty in input variables and parameters is usually described using probability distributions.The objective of an uncertainty analysis is to study the consequence of uncertainty by computing a probability distribution on model output from the set of probability distributions on model inputs.UA aims to answer the following question, "what is the uncertainty associated with the output resulting from the uncertainty associated with the inputs?" The use of formal uncertainty analysis was recently considered as one of the most important accomplishments in risk analysis since the 1980s (Greenberg et al. 2012).Uncertainty analysis allows one to take uncertainty into account when calculating an output variable of interest (e.g., number of spores entering in a given area, Peterson et al. 2009).Uncertainty analysis should be a key component of model-based risk analysis because it provides risk assessors and decision makers with information about the accuracy of model outputs.
The main purpose of sensitivity analysis (SA) is to determine how sensitive the output of a model is with respect to elements of the model subject to uncertainty.The objective of a sensitivity analysis is to rank uncertain inputs according to their influence on the output.Sensitivity analysis can be seen as an extension of uncertainty analysis.Its purpose is to answer the following question "What are the most important uncertain inputs?".Sometimes, SA is also used for a more general purpose such as to understand how the model behaves when some input or parameter values are changed.
Uncertainty and sensitivity analysis are becoming more popular, especially due to development of Bayesian methods and of specialized software and packages (e.g., the sensitivity package of R).However, these techniques should be applied with caution because several factors may influence their results (de Rocquilly et al. 2008;Saltelli et al. 2008) such that in some cases, the validity of conclusions derived from UA or SA may be limited.In this paper, a brief overview of methods of UA and SA are given.Then, a series of practical rules that can be followed by risk assessors to improve the reliability of UA and SA results are defined.These rules are illustrated with the infection model of Magarey et al. (2005).

Brief overview of methods for uncertainty and sensitivity analysis
For some simple models, it is possible to calculate the exact probability distribution of the model output from the probability distributions of the uncertain input variables and/or parameters.However, in most cases, it is not possible to calculate the probability distribution analytically and other methods should be used.One method is to linearise the model from its derivatives in other words the derivatives of the model output with respect to its inputs and parameters.If the uncertain factors are all assumed normally distributed, then it is possible to estimate the probability distribution of the linearised model analytically which is a normal distribution whose mean and variance are functions of the means and variances of the uncertain factors.A limitation of this method is that its application is restricted to the cases where the uncertain factors are in fact normally distributed.It is sometimes more appropriate to use other distributions, especially when the random variables are discrete or when they are bounded.Another limitation is that this method can be unreliable when the linear approximation is not accurate.For these reasons, the use of a four-step method, based on Monte Carlo simulations, adapted from de Rocquilly et al. ( 2008), described below is recommended.

A four-step method for uncertainty analysis
Step 1. Define probability distributions for the uncertain model inputs and parameters The uncertainty about a quantity of interest is frequently described by defining this quantity as a random variable.Uncertainty about model parameter/input values can be described using different types of probability distributions.The uniform distribution, which gives equal weight to each value within the uncertainty range, is commonly used when the main objective is to understand model behaviour, but more flexible probability distributions are sometimes needed to represent the input and parameter uncertainty.When the model input corresponds to a discrete variable, for example, the number of imported consignments, or number of successful incursions, discrete probability distributions such as the Poisson are often appropriate (e.g., Yen et al. 2010).Among continuous distributions, the well-known Gaussian distribution is often convenient, since it requires only the specification of a mean value and a standard deviation.It is often replaced by the truncated Gaussian distribution, triangular, or by beta distributions, which give upper and lower bounds to the possible values (e.g., Peterson et al. 2009;Yen et al. 2010).When the distribution should be asymmetric, for example, when input factors are likely to be near zero, log-normal, triangular, or beta distributions offer a large range of possibilities (e.g., Peterson et al. 2009).When the input variables and parameters are not independent, it is sometimes possible to define multidimensional probability distributions, for example, the multidimensional Gaussian distribution, with non-zero covariances.Probability distributions can be derived from expert knowledge and/or from experimental data.Frequentist statistical methods can be used to estimate standard deviations and confidence intervals reflecting uncertainty due to measurement errors and data sampling procedures.Bayesian statistics offer a variety of methods and algorithms to calculate probability distributions by combining expert knowledge and data (e.g., Makowski et al. 2010;Makowski et al. 2011).
In some cases, it is difficult to define reliable probability distributions for all uncertain model inputs, i.e., probability distributions correctly reflecting the current state of knowledge about input values based on available data and expert knowledge.In such cases, it is useful to define several probability distributions and, when possible, to run the analysis for all of them and to compare the results.This method is illustrated in the example below.When the computation time is too long or when it is not possible to run the analysis several times with different distributions, it is important to present the assumptions explicitly, and to acknowledge that the results of the analysis may have been different if other probability distributions had been defined.
Step 2. Generate values from the distributions defined at step 1 Simple random sampling is a popular method for generating a representative sample from probability distributions.This sampling strategy provides unbiased estimates of the expectation and variance of random variables.Other sampling techniques like Latin hypercube can also be used, especially when the number of variables is large.It is also possible to generate combinations of values of uncertain factors by using experimental designs, for example, complete factorial designs.The latter technique was used by the European Food Safety Authority (EFSA) (2008) to combine estimated minimum, maximum, and most likely values of several uncertain input factors.The choice of the sample size, N, is critical as the reliability of the results of the analysis depends on it.The use of a small N value may lead to inaccurate estimated mean, variance, or quantiles because all of the space defined by the uncertain inputs or parameters may not be sampled, such that the resulting approximation of the probability distribution of the model output may be inaccurate.On the other hand, the use of a very high N value will lead to a large number of model simulations that may be time consuming without adding new information.The choice of the value of N is thus a compromise between computation time and accuracy.
Step 3. Compute the model output(s) for each generated input set Once the parameter/input values have been generated, the next step consists of running the model for each unique set of parameter/input values.For example, if N was set equal to 100, the model must be run 100 times leading to 100 values per output variable.This step may be difficult when computation of model output is timeconsuming and, with some very complex models, the value of N must be set equal to a small value due to computation time constraints.This third step will be easier with more simple and less computationally intensive models.
Step 4. Describe the distributions of the model outputs The distribution of the model output values generated at step 3 can be described and summarized in a number of ways.It is possible to present the distribution graphically using, for example, scatterplots, histograms, density plots.It is also useful to summarize the distribution of the model output values by its mean, median, standard deviation, and quantile values.All these techniques have been applied in several quantitative risk assessments (e.g., Koch et al. 2009;Peterson et al. 2009;Makowski and Mittinty 2010).When several outputs are considered, it is often useful to study the relationship between different outputs using scatterplots and correlation coefficients.

Methods of sensitivity analysis
Sensitivity analysis can be seen as an extension of uncertainty analysis.It comprises computing sensitivity indices to rank uncertain input variables or parameters according to their influence on the model output.Two types of sensitivity analysis are usually distinguished: local sensitivity analysis and global sensitivity analysis (Saltelli et al. 2000).Local SA focuses on the local impact of uncertain quantities on model outputs, and is carried out by computing partial derivatives of the output variables with respect to the inputs/parameter values.With this method, the uncertain quantities are allowed to vary within small intervals around nominal values, but these intervals are not related to the uncertainty ranges of the uncertain model inputs and parameters.Contrary to local SA, global SA considers the full domain of uncertainty of the uncertain model quantities (Saltelli et al. 2008).In global SA, the uncertain inputs and parameters are allowed to vary independently within their whole range of variation.
A sensitivity index is a measure of the influence of an uncertain quantity on a model output variable.Model inputs and parameters whose values have a strong effect on the model are characterized by high sensitivity indices.Less influential quantities are characterized by low sensitivity indices.Thus, sensitivity indices can be used to rank uncertain inputs and parameters, and identify those that deserve more accurate measurements or estimation.A large number of global SA methods are available, for example, ANOVA, correlation between input factors and model outputs, methods based on Fourier series, and methods based on Monte Carlo simulations (Saltelli et al. 2000).Sensitivity indices can be computed using statistical software (e.g., the package sensitivity of the statistical software R http://cran.r-project.org/web/packages/sensitivity/index.html) or more specialized software such as Simlab (http://simlab.jrc.ec.europa.eu/),@Risk, or Crystalball.@Risk and Crystalball can be used with spreadsheet software and include user-friendly interfaces.With all analyses, users will have to define the probability distributions of the uncertain input variables and parameters or, at least, their possible ranges of variation.The users will also have to define the values of some tuning parameters, as shown in the example below.

Example
In this section, we present a simple example to show how uncertainty and sensitivity analysis can be used in practice.We consider the simple generic infection model for foliar fungal plant pathogens defined by Magarey et al. (2005): and zero otherwise where T is the mean temperature during wetness period (°C), W is the wetness duration required to achieve a critical disease intensity (5% disease severity or 20% disease incidence) at temperature T. The model output is W and it is computed as a function of the input T. T min , T opt , T max are minimum, optimal, and maximum temperature for infection respectively, W min and W max are minimum and maximum possible wetness duration requirement for critical disease intensity respectively.This model was used to compute the wetness duration requirement as a function of the temperature for many species and was included in a disease forecast system (Magarey et al. 2005(Magarey et al. , 2007)).
T min , T opt , T max , W min and W max are five species-dependent parameters whose values were estimated from experimental data and expert knowledge for different foliar pathogens (e.g., Magarey et al. 2005;EFSA 2008).However, for some species, these parameters are uncertain due to the limited availability of data (Magarey et al. 2005), and in such cases, it is important to perform uncertainty and sensitivity analysis.
In this case study, uncertainty and sensitivity analysis techniques were applied to the model defined above for infection of citrus by the fungal pathogen Guignardia citricarpa Kiely.According to EFSA (2008), the parameter values are uncertain for this pathogen.The uncertainty ranges considered in this case study for these parameters are presented in Table 1.All computations were done using R (http://cran.r-project.org) and the code is available on request.
Three series of probability distributions were defined from Table 1: i.
Independent uniform distributions (with lower and upper bounds set equal to the values reported in Table 1) ii.
Independent triangular distributions (with lower and upper bounds set equal to the values reported in Table 1, and the most likely values set equal to the medians of the uncertain ranges) iii.
Triangular distributions with positive correlation between T min and T opt .Values of T min were first sampled from the triangular distribution defined in ii.Values of T opt were then generated by adding values sampled from a uniform distribution (14, 16) to the values of T min .With this method, T opt values were always higher than 24°C and lower than 31°C, and were correlated to T min .The parameter T opt does not follow a triangular distribution anymore, but the other parameters are still distributed according to the triangular distributions defined in ii.
These probability distributions were based on the same information; the lower and upper bounds defined for each model parameter in Table 1.Nonetheless, these distributions describe uncertainty in different ways; the triangular distribution gives higher weights to values located in the middle of the range, and the last distribution considers that two parameters out of five are not independent.
An uncertainty analysis was performed by generating N=1,000 parameter values from the three probability distributions defined above successively.Results are presented in Figures 1 (probability distribution i), 2 (probability distribution ii), and 3 (probability distribution iii).The sampled parameter values are more concentrated in the central parts of their uncertainty ranges with the independent triangular distributions (Figure 2) than with the independent uniform distributions (Figure 1). Figure 3 clearly shows that, with distribution iii, T min and T opt were positively correlated.The 99%, 90% 10% and 1% percentiles and mean values of the model output W reported for different temperatures show that, with all probability distributions, uncertainty about fungus wetness duration requirement is quite small if the temperature is close to 27-28 °C, but much larger for temperature below 25 or above 32 (Figures 1-3).Uncertainty about the wetness duration requirement is reduced with the triangular distribution (Figure 2) compared to the uniform (Figure 1).
A sensitivity analysis was performed using the Morris method to identify the most influential parameters of the model.The method of Morris is frequently used to quickly screen among all uncertain inputs (Saltelli et al. 2000;Monod et al. 2006;Morris 1991).The main steps of the method are: • Define a design by combining k values of the p uncertain parameters • Add a small incremental step Δ to one uncertain parameter z i • Compute an "elementary effect" defined by W max (h) where y() is the model function and z 1 , ..., z p are the p uncertain parameters • Repeat the procedure several times for all uncertain parameters • Compute the mean and variance of elementary effects from r replicates.A high mean indicates a parameter with an important influence on the output.A high variance shows that the elementary effect is highly dependent on the value of the uncertain parameter.It indicates either a parameter interacting with another parameter or indicates a parameter whose effect is non-linear.The tuning parameters of the Morris method were set equal to the following values: k=4, p=5, Δ=2, and r=100.The lower and upper bounds of the model parameters were set equal to the values reported in Table 1.Note that it was implicitly assumed here that the uncertain model parameters were uniformly distributed.
Figure 4 shows the mean and the standard deviation of the elementary effect computed using k=4, p=5, Δ=2, and r=100.Results show that the two most influential parameters are T max and T opt .The high standard deviations obtained for both parameters reveals the existence of either strong nonlinear effects or strong interactions between the two parameters.This result shows that the effects of a change of T max and T opt on wetness duration requirements depend on the values of these parameters (non linearity) and/or on the values of the other parameters of the model (interaction).

Practical rules
Five rules are presented below to improve the reliability of uncertainty and sensitivity analysis.

Rule 1: Be transparent about assumptions and methods
In some cases, conclusions of UA and SA depend on assumptions made on probability distributions of uncertain model inputs.Results may also depend on the selected method used to perform UA or SA.Ranking of parameters obtained by SA may thus depend on the method used to compute sensitivity indices.For these reasons, it is important to be transparent about assumptions made on probability distributions and to present in detail the methods used for UA/SA.

Rule 2: Define precisely the model output of interest
Figures 1-3 show that the uncertainty range depends highly on the temperature T. In this example, the uncertainty level can be considered as very low or very high depending on the model output; simulated wetness duration requirements were characterized by low uncertainty levels for temperatures around 27 °C but by high uncertainty levels for more extreme temperatures.This example shows that the conclusions obtained for a given output may not be valid for others.

Rule 3: Assess the accuracy of the estimates
The accuracy of the estimated mean, variance, and quantiles of the probability distribution of the model output depends on the number of simulations.Figure 4 shows the 99%, 90%, 50%, 10%, and 1% percentiles of wetness duration requirements estimated using different numbers of simulations from 10 to 2 000 for T=25 °C.Estimates of the 99% percentiles of model output W were highly unstable when the number of simulations was lower than 500.In this example, at least 1 000 simulations were required to obtain accurate estimate of the 99% percentile.This result shows that it is important to check that a sufficiently high number of simulations were used in all analysis.The stability of the computed quantities can be assessed either graphically, or by computing variances, confidence intervals either analytically or by using nonparametric techniques (e.g., bootstrapping) (Saltelli et al. 2008).

Rule 4: Assess the robustness of results to distribution assumptions
Another important point to keep in mind is that results of uncertainty analysis may depend on distribution assumptions.Table 2 shows the values of the median, 95% and 99% percentiles obtained with N=10 000 Monte Carlo simulations for T=25 °C using the three different types of probability distributions described above.The 99% percentiles obtained with the three distributions were quite different.The 99% percentile was equal to 39.61 h with independent uniform distributions, but the same percentile was lower with the two other distributions, especially with distribution ii.This example illustrates the importance of assessing the robustness of results to assumptions made on probability distributions.The first step of the uncertainty analysis method specified above (Step 1: Define probability distributions for the uncertain model inputs and parameters) is a key step, and it is important to use all available information to derive reliable probability distributions reflecting correctly the current state of knowledge.Although this step is often difficult, the recent development of methods of expert elicitation and of Bayesian techniques offer new possibilities (Makowski et al. 2010;Makowski et al. 2011).

Rule 5: Be aware of the capabilities of different sensitivity analysis techniques and, when possible, compare results
As mentioned above, several methods are available for uncertainty analysis and, even more, for sensitivity analysis.All methods do not have the same capabilities.For example, the Morris method illustrated in Figure 4 is an SA method that can be used to screen quickly among all uncertain inputs.However, this method cannot be used to distinguish between interaction and nonlinear effects, and other techniques for example Fourier amplitude sensitivity testing (FAST) and ANOVA should be applied when a precise analysis of interactions between model inputs is required.

Conclusion
This paper shows that several factors may influence the results of uncertainty and sensitivity analysis, especially the assumptions made about the probability distributions of the uncertain model inputs and parameters, the number of simulations performed with the model, and the type of model output analyzed by the risk assessor.Due to the influence of each of these factors, the validity of the conclusions of an uncertainty or sensitivity analysis may be limited.Practical rules were presented and illustrated in this paper in order to improve the reliability of uncertainty and sensitivity analyses.

Figure 1 .
Figure 1.Results of an uncertainty analysis performed with 1,000 Monte Carlo simulations.The upper graphics show the values of four model parameters sampled from uniform distributions.The lower graphics show the resulting distribution of model outputs, their means (think black line), 10 and 90% percentiles (dashed lines), and 5 and 95% percentiles (dotted lines).

Figure 2 .
Figure 2. Results of an uncertainty analysis performed with 1,000 Monte Carlo simulations.The upper graphics show the values of four model parameters sampled from independent triangle distributions.The lower graphics show the resulting distribution of model outputs, their means (think black line), 10 and 90% percentiles (dashed lines), and 5 and 95% percentiles (dotted lines).

Figure 3 .
Figure 3. Results of an uncertainty analysis performed with 1,000 Monte Carlo simulations.The upper graphics show the values of four model parameters sampled from triangle distributions assuming a positive correlation between T min and T opt .The lower graphics show the resulting distribution of model outputs, their means (think black line), 10 and 90% percentiles (dashed lines), and 5 and 95% percentiles (dotted lines).

Figure 4 .
Figure 4. Results of the Morris method.

Figure 5 .
Figure 5.Estimated values of median (continuous line), 5% and 95% percentiles (dashed lines), 1% and 99% percentiles (dotted lines) of wetness duration requirements in function of the number of Monte Carlo simulations.

table 1 .
Uncertainty ranges of the five model parameters for Guignardia citricarpa Kiely