Current limitations and future prospects of detection and biomonitoring of NIS in the Mediterranean Sea through environmental DNA

The biodiversity of the Mediterranean Sea is currently threatened by the introduction of Non-Indigenous Species (NIS). Therefore, monitoring the distribution of NIS is of utmost importance to preserve the ecosystems. A promising approach for the identification of species and the assessment of biodiversity is the use of DNA barcoding, as well as DNA and eDNA metabarcoding. Currently, the main limitation in the use of genomic data for species identification is the incompleteness of the DNA barcode databases. In this research, we assessed the availability of DNA barcodes in the main reference libraries for the most updated inventory of 665 confirmed NIS in the Mediterranean Sea, with a special focus on the cytochrome oxidase I (COI) barcode and primers. The results of this study show that there are no barcodes for 33.18% of the species in question, and that 45.30% of the 382 species with COI barcode, have no primers publicly available. This highlights the importance of directing scientific efforts to fill the barcode gap of specific taxonomic groups in order to help in the effective application of the eDNA technique for investigating the occurrence and the distribution of NIS in the Mediterranean Sea.


Introduction
The Mediterranean Sea represents one of the most important biodiversity hotspots in the world (Myers et al. 2000;Coll et al. 2010;Lejeusne et al. 2010;Marrocco et al. 2019), accounting for more than 17,000 reported marine species (Coll et al. 2010). However, the number of Non-Indigenous Species (NIS) and their impact on native species is steadily increasing (Villèle and Verlaque 1995;Streftaris and Zenetos 2006;Marrocco et al. 2018; Bariche et al. 2020). Therefore, the Mediterranean Scientific Community highlights the importance of early warnings and monitoring the presence and distribution of NIS (Katsanevakis et al. 2016;Darling et al. 2017;Tsiamis et al. 2020). Nowadays, this is a key requirement for the conservation and management of ecosystems, as stated by the regulation (EU) No 1143/2014 of the European Parliament and of the Council of 22 October 2014 on the prevention and management of the introduction and spread of invasive alien species (Tiralongo et al. 2019).
Until now, the assessment of Mediterranean species diversity has been carried out through traditional methods based on sole morphological identification. These methods present several disadvantages, such as the difficulty in surveying large geographical areas and spotting and identifying the so-called "hard-to-detect species" (Tiralongo et al. 2020). Besides, they are mainly based on recognizable adult features, such as the shape of gonads or other particular body parts and, often, do not give any identification key for the larval forms or the early developmental stages (Ponti et al. 2009;Di Sabatino et al. 2014;Pinna et al. 2017). They also easily mislead the identification of individuals when their morphology is altered by stressful environmental conditions or by sampling and preservation techniques (Leese et al. 2016;Pawlowski et al. 2018;Tiralongo et al. 2020). In addition, traditional phenotypic-based methods require the expertise of taxonomists, especially when there is the need to identify a species never observed before in a certain area, including NIS (Leese et al. 2016;Pawlowski et al. 2018).
Consistent biological records can provide a better understanding of the distribution of marine species, their expansion range, and the arrival of new NIS in the Mediterranean basin (Mannino et al. 2019;Bariche et al. 2020). A promising approach for the identification of species and biomonitoring of ecosystems is the use of molecular tools such as DNA barcoding, metabarcoding and environmental DNA (Pawlowski et al. 2018;Specchia et al. 2020;Pinna et al. 2021;Tzafesta et al. 2021). DNA barcoding refers to a single species identification with the use of a short DNA fragment, while in the metabarcoding technique the DNA is extracted from a sample containing more than one organism/species, amplified and sequenced by Next Generation Sequencing (NGS) (Ji et al. 2013;Deiner et al. 2017). DNA metabarcoding allows the identification of species at low densities and the detection of taxa that traditional approaches generally fail to distinguish (Pawlowski et al. 2018;Zangaro et al. 2020). Another innovative technique for the identification of species at low concentration is the eDNA (Pawlowski et al. 2018), which is based on the extraction of DNA directly from environmental samples like water or sediment (Rees et al. 2014). This technique can efficiently be applied for assessing the presence and the distribution of NIS that are hard to detect.
Also, it may result in a perfect tool for monitoring and preventing new NIS arrivals, for instance, by analysing the genetic content of ballast waters and monitoring it in water exchanges between different regions Tzafesta et al. 2021).
However, even the application of molecular techniques faces some challenges. The level of uncertainty linked to eDNA for marine environments depends generally on the persistence time of the DNA in marine systems (Collins et al. 2018), the high level of connection and movement due to the aquatic medium, and the incompleteness of the reference public libraries (Cagnacci et al. 2012;Weigand et al. 2019;Specchia et al. 2020). Once a DNA fragment has been sequenced, it needs to be blasted into reference libraries to identify the species it belongs to. Moreover, the success of the eDNA technique is also based on the efficiency of the primer sets on large numbers of taxa, a key requirement to correctly amplify the investigated gene in the environmental sample to identify as many species as possible (Elbrecht et al. 2017).
The main DNA barcode reference libraries are GenBank, by the National Centre for Biotechnology Information (NCBI), and BOLD (Barcode of Life Data) Systems (Ratnasingham et al. 2007;Leese et al. 2016;Macher et al. 2017). The information available in the reference libraries includes the species name, the nucleotide sequence of the target genes and, optionally, the PCR primer pairs used for the amplification of the gene of interest in the target organism (Ratnasingham et al. 2007;Macher et al. 2017).
The Consortium for the Barcode of Life (CBOL; www.barcoding.si.edu.com) and the International Nucleotide Sequence Database Collaborations (INSDC) designated the mitochondrial cytochrome oxidase subunit I (COI) as the main barcoding gene based on its widespread presence among different taxonomic groups (Hebert et al. 2004a;Hebert et al. 2004b;Saunders 2005;Ward et al. 2005). Moreover, nucleotide sequence polymorphisms of this approximately 500 bp COI barcode region provide valuable information not only on species identification but also on population genetic diversity and structure (Goetze et al. 2016;Abbas et al. 2018;Choo et al. 2020). Several barcoding studies have also identified alternative genes that can be successfully used for molecular barcoding and may be more suitable to a specific taxonomic group. For example, the ribosomal genes 16s and 18s are generally used for the identification of prokaryotes (Stackebrandt 1994;Acina et al. 2004) and eukaryotes (Hadziavdic et al. 2014;Bradley et al. 2016), respectively; the nuclear ribosomal internal transcribed spacer 1 and 2 (ITS) for fungi (Scoch et al. 2012;Badotti et al. 2017); and two plastid genes, the maturase-coding gene (matK) and the large subunit of ribulose 1,5-bisphosphate carboxylase-coding gene (rbcL) for plants (CBOL 2009), among others.
In light of this, we wanted to evaluate the current status of DNA barcode availability for the NIS already detected in the Mediterranean through morphological surveys. To do so, we retrieved the most recent list of NIS published by Zenetos and Galanidi (2020) and we looked for the availability of COI barcodes and primers in reference libraries. If COI barcodes were not retrieved, we then searched for other barcoding genes. If no records were found in the reference libraries, this was then referred to as DNA barcode gap.
The aim of this research is to evaluate the current limitations in the application of molecular barcoding due to the barcode gap of Mediterranean NIS, and to investigate in depth the occurrence of COI gene barcode and primer pairs. Furthermore, we indicate which taxonomic groups may be underestimated by using molecular tools for the detection and biomonitoring of NIS in the Mediterranean Sea through environmental DNA.

Checklist of NIS occurring in the Mediterranean Sea
We obtained an updated checklist of confirmed alien species occurring in the Mediterranean Sea using an inventory of NIS published by Zenetos and Galanidi at the start of 2020. In this inventory, a total of 666 marine NIS established in the Mediterranean Sea are divided into 10 high-ranked taxonomic groups, as defined by the authors (Zenetos and Galanidi 2020).
The names of the species were verified using the following platforms: EU-NOMEN (http://www.eu-nomen.eu), FishBase (https://www.fishbase.de), ALGAEBASE (https://www.algaebase.org), EASIN (https://easin.jrc.ec.europa.eu/easin) and WORMS (http://www.marinespecies.org) . This resulted in a total number of 665 NIS because we excluded Chaetoceros bacteriastroides, an uncertain (unassessed) species on all the above-mentioned platforms. For each of the 665 NIS, we considered the currently accepted name and all of the synonyms and older names, to ensure that the species in question is really absent in the reference libraries.

DNA barcode libraries interrogation and data analysis
The 665 NIS official and alternative nomenclatures were manually entered in BOLD Systems and GenBank to search for a COI barcode. If a COI barcode was retrieved, we then looked into the availability of primer pairs and their use across different taxonomic groups. We also recorded other genes (5.8s, 12s, 16s, 18s, 28s, cytb, rbcL), when the COI barcode was not available, to correctly estimate the barcode gap. All the data was compiled in an Excel file available as Suppl. material 1: Table S1, which we used as a starting point to quantify the barcode gap as a percentage of species within each group.

COI barcode and primer gap in NIS occurring in the Mediterranean Sea
In total, 665 NIS established in the Mediterranean Sea, belonging to 132 orders, were divided into 10 main taxonomic groups (Fish, Parasites, Phytobenthos, Ascidians, Bryozoa, Crustacea, Miscellanea, Mollusca, Polychaeta, and Zooplankton; Zenetos and Galanidi 2020), and their DNA barcoding gap in reference libraries was investigated. At the end of June 2021, 220 out of 665 NIS did not have any barcodes in reference libraries (BOLD Systems and GenBank), showing a barcoding gap of 33.18%. Of the remaining 445 barcoded species, 14.16% did not have a COI barcode but still presented another gene barcode (Suppl. material 1: Table S1).
For the 382 species associated with a COI barcode in the DNA reference libraries, we further looked into the availability of primer pairs, finding that 45.30% do not have publicly available primer pairs. Moreover, of the 55 primer pairs found across different taxonomic groups, only 4 pairs were used in more than one phylum. They are LCO1490/HCO2198, LCO1490_t1/HCO2198_t1 and C_LepFolF/C_LepFolR found in Chordata, Arthropoda and Mollusca, and jgLCO1490/jgHCO2198 found in Arthropoda, Chordata, Mollusca, Bryozoa and Echinodermata. No universal primer pairs were identified.

Barcode and primer pair gaps in taxonomic groups
In the "Bryozoa" group, 30 NIS, divided into 2 orders, have been analysed. Among these, 23 species (76.67%) were not associated with a DNA barcode (Fig. 1), representing the group with the largest gap. Six species present a COI barcode, while only one (Celleporella carolinensis) does not have any public record apart from a partial coding DNA sequence (cds) of the elongation factor 1 alpha. For this taxonomic group, only 1 primer pair (jgLCO1490/jgHCO2198) for COI gene amplification was identified and it is only used in two species of the same genus: Celleporaria aperta and Celleporaria brunea.
The second group with the most extensive barcode gap is represented by "Parasites", consisting of 25 NIS divided into 10 orders. In this group, 14 species (56%) lack a barcode, 8 (32%) have a COI barcode and 3 (12%) have a different gene barcode (Fig. 1). The 3 species lacking COI have a record on GenBank of a coding sequence (cds) annotated as ribosomal subunit, which we identified through BlastN as 28s for Boninia neotethydis and Tetrancistrum polymorphum, and 18s for Thulinia microrchis.  Like Bryozoa, Parasites display a substantial COI primer pair gap, having only 2 sets of primers available: HCO2198/LCO1490, only used in Heterosaccus dollfusi and jgH-CO2198/jgLCO1490, only used in Livoneca redmanii. The group "Mollusca" contains the highest number of NIS, amounting to 156 species divided into 28 orders. Among these, 81 species (51.92%) are not associated with a DNA barcode, 67 (42.95%) have a COI barcode, while 8 (5.13%) have another barcode, mainly represented by 16s, 18s and 28s (Fig. 1). For this taxonomic group, 13 different COI primer pairs have been identified. These primer pairs were tested in 26 species, leaving a COI primer pair gap in 41 species (26.81%).
The group "Crustacea" consists of 83 NIS, divided into 7 orders. Among these, 26 species (31.33%) are not associated with a DNA barcode, 52 species (62.65%) have a COI barcode and 5 species (6.02%) have another barcode. Four out of these 5 species lacking COI present either 12s, 16s or both, while one (Thalamita poissonii), presents Thapmar 1.5 transposon as the only record. For this taxonomic group, 9 different primer pairs for COI amplification were found. These 9 primers were used in 38 species (45.78%), while the remaining 14 species with COI sequence did not have a primer set, resulting in a COI primer pair gap of 16.87%. The most used primer set is HCO2198/LCO1490, found in 26 out of 38 species (68%).
The group "Zooplankton" consists of 38 NIS divided into 14 orders. Among these, 10 (26.32%) are not associated with a DNA barcode, 27 (71.05%) have a COI barcode and only one (Parvocalanus elegans) does not have COI but 28s, instead. This group is the one with the second largest COI primer pair gap, having 4 primer sets used only on 6 out of 26 COI barcoded species, giving a COI primer gap of 77.78%.
The group "Miscellanea" consists of 40 NIS divided into 6 phyla and 21 orders. Among these, 8 species (20%) are not associated with a DNA barcode, 27 (67.50%) have a COI barcode and 5 (12.50%) have either 16s, 18s, or both. For this taxonomic group, eight primer pairs were found, but used only on 8 out of 27 COI barcoded species, leaving a primer pair gap of 70.37%.
The group "Phytobenthos" is the second-largest NIS group, with 113 NIS divided into 27 orders. Among these, 21 species (18.58%) are not associated with a DNA barcode, 58 species (51.33%) have a COI barcode and 34 species (30.09%) have another barcode. This is the group where an alternative barcode gene to COI has been used the most since COI is generally used for barcoding animal species. RbcL is the most represented gene for this group, covering 25 out of 34 species. For this taxonomic group, a total of 12 primer sets for COI amplification were found. These primers were used on 26 out of 58 species, leaving a COI primer pair gap of 44.83% (Fig. 2). The most used primer set is GWSFn/GWSRx, present in 17 out of 26 species (65.38%).
The group "Ascidians" consists of 26 NIS divided into 3 orders. Among these, 4 species (15.38%) are not associated with a DNA barcode and 22 species (84.62%) have a COI barcode. No other barcoding genes were found. Five primer sets were identified, covering 13 out of 22 barcoded species, leaving a primer pair gap of 40.91%. The most used primer sets are jgLCO1490/jgHCO2198 and Tun_Forward/Tun_reverse2, used in 6 and 7 species, respectively.
The group "Fish" consists of 89 NIS divided into 15 orders. Among these, only 3 (3.37%) species are not associated with a barcode, 86 species (96.63%) have a COI barcode and 1 (Caesio varilineata) does not have a COI barcode, but a 12s, instead (Fig. 1). For this taxonomic group, 19 different primer pairs for COI gene amplification have been identified. These 19 primers cover a total of 78 species, leaving only 8.24% of COI barcoded species without a primer set (Fig. 2). The most used primer sets are C_FishF1t1/C_FishR1t1 and VF2/VR1, found in 54 and 51 species respectively, 34 of which present both primers.

Discussion
The aim of this study was to quantify the extent of the DNA barcode gap for the NIS established in the Mediterranean Sea, as identified by Zenetos and Galanidi in 2020 and, in doing so, to direct the efforts of the scientific community towards specific taxonomic groups. The data show that 33% of NIS do not have any record in public libraries, making it impossible to detect these species through DNA barcoding techniques. Bryozoa and Parasites are the ones with the largest gap relative to the total number of species in each group. However, Mollusca, which covers almost 25% of the total number of NIS, also needs attention having a barcode gap in 50% of the species, followed by Polychaeta with a gap of 41%. On the other hand, Phytobenthos, Ascidians and especially Fish appear to be the groups that could be mostly identified through molecular techniques, having smaller barcode gaps of 18%, 14% and 3%, respectively. Our analysis highlighted the importance of analysing barcode gaps in reference libraries for the successful application of molecular tools (including eDNA and DNA metabarcoding) in biomonitoring assessments. Gap-analysis surveys focusing on DNA barcode presence in public repositories for different groups of species are recently gaining greater attention from the scientific community. Gap-analysis has already been applied on marine NIS (Duarte et al. 2021), on macrofauna of a region of the North Sea (Hestetun et al. 2020), on aquatic macroinvertebrates of South-East Italy ), on marine macroinvertebrates of the Atlantic Iberia (Leite et al. 2020), and on Ascidians and Cnidarians of the European Register of Marine Species (ERMS; Paz and Rinkevich 2021). However, to our knowledge, this is the first study that investigates the DNA barcode gap for NIS occurring in the Mediterranean Sea.
This study also confirms that COI is a useful genetic marker because it is broadly sequenced across different phyla, making it a good candidate gene for identifying species in an environmental sample. Nonetheless, relying only on one DNA fragment may lead to misidentification of pooled samples due to possible sequence similarity; this is why multigene approaches should be preferred in molecular biomonitoring studies (Zou et al. 2012;Chesters et al. 2015;Gangan et al. 2019). Phytobenthos could be the first group to apply this approach, having 30% of species already barcoded with rbcL but not with COI. This is probably explained by the fact that rbcL is a standard barcode for plants (CBOL 2009;Maloukh et al. 2017;Kang et al. 2017;Weigand et al. 2019), further proving the advantage of selecting not only universal barcodes but also relevant taxa-specific genes.
Moreover, the success of the eDNA metabarcoding is based on the availability of efficient primer sets for the amplification of several taxa in a given sample (Elbrecht et al. 2017;Tzafesta et al. 2021). However, only 26% of NIS occurring in the Mediterranean Sea appear to have publicly available COI primers. Especially for animals, 147 species out of 324 COI barcoded species (45.40%) were lacking primer pairs, which highlights the need for further evaluation of primers or the design of new ones. In addition, no universal primer pair was identified, resulting in more laborious molecular identifications where an environmental sample needs to be amplified with several sets of taxa-specific or even species-specific primers to be correctly assessed.
For the above reasons, also the primer pair gap needs to be filled. In order to do so, both increasing the surveys regarding NIS occurring in the Mediterranean Sea and improving the barcoding studies at a global scale is essential, as well as biodiversity assessments . Although the content of the databases doubles approximately every 18 months (https://www.ncbi.nlm.nih.gov/genbank/ statistics/), probably many of the NIS established in the Mediterranean Sea come from underdeveloped regions, which cannot financially support molecular surveys. This is easily extrapolated by looking at the "Data Releases" page provided by BOLD Systems (https://www.boldsystems.org/index.php/datarelease). Because of that, we not only encourage collaboration of researchers in this sector, but we also stress the need for training and inclusion of researchers from developing countries, which represent the current and, probably, the future source of new and hard-to-detect NIS.
In conclusion, it is essential to underline that molecular techniques represent a great opportunity to improve the study on the occurrence and distribution of NIS. Hence, a specific gap needs to be filled by the scientific community to make molecular identification totally efficient and independent at a regional, national, and transnational level.