Towards an open, zoomable atlas for invasion science and beyond

Biological invasions are on the rise, and their global impacts on ecosystems, economies and human health are a major challenge. Invasion science is critical to mitigate invader impacts, yet due to the strong increase of data and information in this area, it has become difficult to acquire and maintain an overview of the field. As a result, existing evidence is often not found, knowledge is too rarely transferred to practice, and research is sometimes conducted in pursuit of dead ends. We propose to address these challenges by developing an interactive atlas of invasion science that can be extended to other disciplines in the future. This online portal, which we aim to create in the course of the project described here, will be an evolving knowledge resource and open for anyone to use, including researchers, citizen scientists, practitioners and policy makers. Users will be able to zoom into the major research questions and hypotheses of invasion science, which are connected to the relevant studies published in the field and, if available, the underlying raw data. The portal will apply cutting-edge visualization techniques, artificial intelligence and novel methods for knowledge synthesis.


Introduction
The number of non-native species has been strongly increasing over time worldwide, and there is currently no sign that this trend is going to stop (Seebens et al. 2017). Non-native species (also called alien species) are those species that have been intentionally or unintentionally transported to and introduced in areas outside their natural range (Blackburn et al. 2011;Jeschke et al. 2013). Some of these species establish and spread in their new ranges and/or cause detrimental impacts on ecosystems, economies or human health -these species are called invasive species. Invasion science, the study of non-native (including invasive) species and their environments, is therefore highly relevant to prevent and manage negative consequences for biodiversity, socio-economics and human health (IPBES 2019).
However, due to an exponential increase of data and information in invasion science, it has become difficult to acquire and maintain an overview of the field (Enders et al. 2018(Enders et al. , 2019. This makes research relatively ineffective and inefficient, as existing evidence is often not found, collaboration opportunities are missed, and research is too often conducted in pursuit of dead ends. In addition, there is a slow transfer to practice, as practitioners are often not able to locate experts and knowledge relevant to their problems. This information is scattered across tens of thousands of research papers. Similar challenges can be observed for many other research fields (Kraker et al. 2021b). The quote by John Naisbitt from the 1980s that we are "drowning in information but starved for knowledge" (p. 24 in Naisbitt 1982) thus seems to be more applicable than ever before ; see also Burke 2020). We need novel tools to take full advantage of published scientific findings.
Along these lines, the science philosopher Philip Kitcher wrote in his book "Science in a democratic society": "Even when informed and well-intentioned scientists try to think broadly about research options, their discussions suffer from the absence of a synthetic vision. Instead of pitting one partial perspective against another, it would be preferable to create a space in which the entire range of our inquiries could be soberly appraised. We would do well to have an institution for the construction and constant revision of an atlas of scientific significance" (p. 127 in Kitcher 2011). We strongly agree such an atlas would be extremely useful, and propose to take significant steps in this direction with the project outlined here.
Existing tools to explore the scientific literature have key drawbacks. Both Clarivate Analytics' Web of Science and Elsevier's Scopus are large literature databases behind a paywall, thus only accessible to researchers at institutions with libraries that are both financially able and willing to cover hefty subscription fees. The exact amount of these fees varies according to the size of the subscribing institution. For example, the Texas A&M University Libraries paid in 2019 ca. US$ 212,000 for the Web of Science and ca. US$ 140,000 for Scopus (Tabacaru 2019). While there has been a public debate -and outcry -about high subscription fees for research journals, which cause critical financial challenges for science libraries even in affluent countries, it is often unnoticed that there are other strong paywalls in the scientific universe, such as for literature databases. If researchers do not have access to tools that help them to explore and discover scientific publications, they are not able to really stand "on the shoulders of giants", but need to reinvent and reinvestigate what others have already done. In addition, even researchers with access to these databases are not allowed to share the re-used data. For example, it is not usually possible to provide data downloaded from these databases along with the articles analyzing the data, leading to the situation that the analyses cannot be reproduced by others.
The freely searchable literature database Google Scholar is probably the tool used by most researchers without access to either the Web of Science or Scopus. Google Scholar is far from an ideal research tool, though. It has largely remained unchanged since its launch in 2004. Its search results are not reproducible by others, which is a problem for scientists, for example when they aim to perform a systematic literature review. Search hits in Google Scholar are created by a black-box algorithm that possibly returns different results depending on where and with which user profile a search was done. Furthermore, Google Scholar returns a list of possibly relevant papers in text form, but such a format does not allow users to grasp, and thus take advantage of, the many papers that are often available for a given scientific topic or search string.
A visual navigation tool would be much more powerful for taking advantage of Big Data (Börner 2014;Vargas-Quesada et al. 2017). The innovative discovery infrastructure Open Knowledge Maps (https://openknowledgemaps.org) provides visual maps when typing in keywords characterizing a scientific topic (Kraker et al. 2019). Open Knowledge Maps is the main driver behind the powerful open source knowledge mapping framework Head Start (Kraker et al. 2020). Head Start provides an interactive, web-based visualization interface and comes with a sophisticated artificial-intelligence backend that is capable of automatically producing knowledge maps from a variety of data, including text, metadata and references (Kraker et al. 2016). Head Start is used in a number of systems and projects, including the H2020 projects OpenUP and TRI-PLE, the OpenAIRE Tender Project VIPER (Kraker et al. 2018) and the EOSC Secretariat project CoVis (Kraker et al. 2021b Ayers et al. 2019;Waagmeester et al. 2019). Yet although this database has high potential (e.g. Waagmeester et al. 2020;Rutz et al. 2021), it currently does not systematically cover the different scientific disciplines. As preliminary work, we thus included >26,000 publications of the field of invasion science in Wikidata, where they can be explored in a domain-general way through tools like Scholia (https://tools.wmflabs.org/scholia/topic/Q42985020; Nielsen et al. 2017;Rasberry et al. 2019).
Another challenge of existing approaches for exploring scientific publications is that they do not link these to the big research questions, concepts and hypotheses of research fields. The novel hierarchy-of-hypotheses (HoH) approach allows to do so (Jeschke et al. 2012;Heger and Jeschke 2014;Jeschke and Heger 2018;Heger et al. 2021). A first visualization based on the HoH approach where 12 hypotheses in the field of invasion science are connected to >1100 studies is available at https://hi-knowledge.org . We have also explored approaches to create networks of research hypotheses (Enders et al. 2018(Enders et al. , 2019. These approaches can be used to create networks of research questions too, thus making the tools applicable for research disciplines without established major hypotheses.

Objectives and approach
We aim to develop a prototype of a unique interactive atlas of invasion science that can be extended to other disciplines in the future. This interactive knowledge portal will (a) build on the strengths of Open Knowledge Maps in organizing and visualizing scientific knowledge, (b) connect it to Wikidata and (c) be conceptually based on the HoH approach. The portal will also have some similarities to e.g. Google Maps in that it is a zoomable navigation tool. In our case, users will be able to zoom into the field's conceptual structure, its big and smaller research questions, its major hypotheses and more specific operational hypotheses. All of these are connected to the relevant studies published in the field and, if available, the underlying raw data. It will be an openly accessible web portal providing FAIR open data (Wilkinson et al. 2016), all developed under an open source license. As a literature database with search functions, it will complement Google Scholar, where the data cannot be openly reused, and other literature databases such as the Web of Science and Scopus which are extremely expensive and not reusable either (see above). The focal research field is invasion science here, although the web portal will be set up so that it can evolve through time and cover other research fields in the future.
The working title of the proposed knowledge portal is enKORE: EvolviNg KnOwledge Resource. enKORE will be an interactive atlas of up-to-date knowledge that "connects the dots". It will have the following key features: 1. Suitably licensed publications will be made available as full text and connected to the raw data if these are available in an open format. If the raw data or publications are not freely available, key meta-data, such as authors, title and abstract, will be provided together with a link to the journal's website, preferably via persistent identifiers like DOI.
2. An interactive and zoomable visualization of research topics, where major research questions are hierarchically structured into more specific questions and, if applicable, to concepts and hypotheses in the field, which are in turn structured into more specific hypotheses. The publications and raw data will be linked to these questions and hypotheses (Fig. 1). This feature will thus, for example, allow users to easily find publications on similar questions and hypotheses by zooming into and out of the conceptual map.
3. Interactive on-demand analyses, allowing users to select studies done in a particular country, region or ecosystem, or focusing on a particular (group of ) species. At the moment, such analyses are typically carried out once by researchers summarizing and analyzing the results of studies for a given research question or hypothesis. The results of such analyses are then published as a static paper, but it is not possible to easily repeat the same analyses (i) after some time has passed and the evidence base has changed, or (ii) by changing one or more settings of the analyses, such as additionally including studies following a methodology that the original author did not consider relevant, or studies focusing on animals rather than plants. enKORE will allow for interactive analyses that can be repeated on demand. By including automated processes, it will, for example, be possible to receive notifications about updated analyses.  Enders et al. 2018Enders et al. , 2019Enders et al. , 2020 for details about these hypotheses) which can be further divided into sub-hypotheses (shown for the enemy release hypothesis, cf. Jeschke 2014, Jeschke andHeger 2018) and (c) other features of the publications and data. enKORE's hierarchical structure will allow users to zoom from research questions into hypotheses, sub-hypotheses, publications and data, or vice versa to zoom out from publications and data to the hypotheses and research questions these address.
The web portal will improve shared understanding within and across disciplinary contexts, increase collaboration and enable easier knowledge transfer to education and practice. Our vision is that it will foster theory-building within the discipline, and at the same time allow transfer of knowledge to other parts of society. The approach developed in this project can be easily transferred to other fields, extending its benefits far beyond invasion science, thus harnessing the potential of increased digitization to improve effectiveness and efficiency of global research.

Project structure
These features will be developed in five work packages: (WP1) conceptual classification system integrating research questions and invasion hypotheses; (WP2) interactive evidence synthesis; (WP3) semantic data structures based on WP1 that will automatically ingest the literature into Wikidata; (WP4) engaging with the research and Wiki community; and (WP5) data-driven visualization techniques based on artificial intelligence (Fig. 2).

Work package 1: conceptual classification system integrating research questions and invasion hypotheses
This work package will be based on the hierarchy-of-hypotheses approach and hypothesis networks (see above for references). The website hi-knowledge.org (https:// hi-knowledge.org) is a first attempt to combine these two approaches, as it features a zoomable (hierarchically structured) hypothesis network. However, it only includes 12 hypotheses in the field of invasion science, whereas Enders et al. (2018Enders et al. ( , 2019Enders et al. ( , 2020 show hypothesis networks with more than 30 invasion hypotheses. In addition, our proposal here is to also include studies that address research questions without reference to established hypotheses. A core task of WP1 will thus be to create, based on our preliminary work, a conceptual classification system in which all research studies on biological invasions can be integrated. We will construct a hierarchical network of research questions in which major invasion hypotheses (see Ricciardi et al. 2013;Enders et al. 2018Enders et al. , 2019Enders et al. , 2020Schulz et al. 2019) will be integrated (cf. Fig. 1). This is possible because research hypotheses are based on research questions. For example, several of the hypotheses in Enders et al. (2018Enders et al. ( , 2019Enders et al. ( , 2020 relate to the question why some non-native species have a higher invasion success than others; other hypotheses relate to the question why some ecosystems are more vulnerable to biological invasions than others. On the other hand, not all research questions are related to established hypotheses, as for some questions, a major hypothesis does not (yet) exist. This is, for example, the case for observed differences in introduction pathways among non-native species of different taxonomic groups (Saul et al. 2017).
The >1100 publications included in Jeschke and Heger (2018) and hi-knowledge. org are so far organized according to hierarchical representations of major hypotheses, but not yet according to research questions. Thus, a second important task of WP1 will be to manually classify these publications according to the newly developed scheme, so that we have a full set of expert-validated links to >1100 publications. This will be done jointly with collaborators and students interested in conceptual work. Such a manual classification is important as a comparison and training opportunity for the algorithmbased classification (WP3).
In addition to research questions and hypotheses, research studies on biological invasions can also be structured according to other factors, such as taxonomic groups (given as scientific names and in several languages), regions in which a study was performed, authors or groups of authors (cf. Lokatis and Jeschke 2018) who performed the studies, the research approach that was applied (experimental vs. observational studies; field vs. enclosure vs. laboratory studies) or the timing of the invasions (Fig. 1). In WP1, we will decide, based on expert and user feedback (see WP4), which features of publications will be included as available information in the future webtool. The aim is to allow future users of enKORE to decide on their own which criteria they want to apply for structuring or filtering the literature.

Work package 2: interactive evidence synthesis
The website hi-knowledge.org (https://hi-knowledge.org) does not only present a hierarchical network of invasion hypotheses, but also shows the level of empirical support for hypotheses according to published literature. In WP2, we will integrate this information into the new web portal enKORE. This will be done for the >1100 publications included in Jeschke and Heger (2018) and hi-knowledge.org. We will develop a possibility to enrich the filtering options developed in WP1 and WP5 such that they take into account the respective levels of evidence. enKORE will thus allow to perform interactive analyses of the level of evidence for specific hypotheses, filtered according to taxonomic group, region, research method and other factors (cf. Fig. 1). In this way, users can assess whether a specific hypothesis has proven useful for the taxonomic group or type of ecosystem they are interested in, or check the robustness of hypotheses across different research approaches (e.g. experimental vs. observational studies, lab vs. field studies).
The information on the level of evidence for or against major hypotheses in invasion science summarized in hi-knowledge.org has been manually extracted from the literature. Integration of additional publications and continuous updates will only be possible with the help of novel approaches including automated methods. A second work step in WP2 will therefore be to review existing approaches, e.g. for the extraction of the respective information from publications, and assess options for a future integration of respective tools in enKORE. Existing contacts with experts working on developing such tools will be very useful in this context, for example the teams behind the Open Research Knowledge Graph (ORKG, Auer et al. 2021) and the Biodiversity Community Integrated Knowledge Library (BiCIKL).

Work package 3: semantic data structures
In WP3, we will build semantic data structures -also known as knowledge graphsin Wikidata that are based on persistent identifiers for publications, authors, research questions, hypotheses and the relationships between them, focal non-native species, study locations, research methods etc. (Fig. 1). To the extent possible, we will build on existing ontologies and controlled vocabularies (an ontology is a formal representation of the concepts and other key properties of a subject area and how they are related to each other). In a preliminary project carried out in collaboration with Birgitta König-Ries, Ria Stangneth and Alsayed Algergawy from Friedrich Schiller University Jena, Germany, we have already started to build an ontology for the main concepts included in 12 invasion hypotheses featured in https://hi-knowledge.org (Algergawy et al. 2020). We will also work on mechanisms to automatically identify publications in invasion science, to annotate them as to what precise subjects they are about and to classify and categorize them according to their relationship to the identified hypotheses. These classifications will be a first imperfect iteration, and they will need to be reviewed and curated (WP4) by experts in the field. Such experts can themselves be identified through queries to the Wikidata-based knowledge graph set up in this work package, along with relevant datasets, publications, species, study sites, institutions or even conferences or funders and changing trends over time. Since Wikidata uses Semantic Web standards and its data are in the public domain, other knowledge graphs such as ORKG (Auer et al. 2021) will be able to reuse and build on the curation work performed in the framework of enKORE.
Work package 4: engaging with the research and Wiki community It will be critical that enKORE will be user friendly and that we engage with the research community, citizen scientists, the Wiki community and other stakeholders, such as managers, teachers, policy makers and science journalists. We will do this through workshops and online videos, including a tutorial, in which we explain both the benefits of using enKORE and how it can be used. Wikidata's multilinguality facilitates collaborations of people who do not share a common language, which allows to bring professional researchers together with citizen scientists from around the world, e.g. for specific regions or taxa or from platforms like iNaturalist that are increasingly being integrated with Wikidata. In the future when enKORE will grow beyond invasion science, we will first target related fields in biodiversity science, so that the community will grow in parallel with enKORE's coverage. The enKORE tool itself will, at least initially, only be available in English, but multilinguality will be helpful for future extensions.
For the current project, we aim to organize two large workshops to engage with researchers, the Wiki community and other stakeholders. In these workshops, we will introduce the tools we propose to develop, discover user demands, conduct user tests including options for data curation, and receive feedback. To foster our engagement with user groups, we will additionally develop and distribute a demo and promotion video plus a tutorial in several languages (at least English, German, French, Spanish and Chinese).
This engagement with user groups also serves an additional purpose. As outlined in WP1 above, we have already manually classified more than 1100 publications in the field of invasion science and plan to use this classification to train the algorithm developed in WP3. However, the algorithm will not be perfect and will indeed make classification mistakes. What it will do is provide a rough classification of publications in the field of invasion science. It will be critical that these automated classifications are checked by experts and, if necessary, corrected. We will invite users to provide these corrections online and will use them to further improve the algorithm.
We are confident that researchers will be highly interested in enKORE due to its novel features, particularly because the exact nature of these features will be specified by the users themselves. This co-design element of the project will be possible thanks to the workshops and online channels. In addition, researchers will have an interest that their publications are correctly included in the database, hence invasion scientists will have an incentive to curate their data and improve the algorithm-based classifications where necessary.

Work package 5: data-driven visualization techniques
In WP5, we will develop visualizations and visual search capabilities to enable exploration and discovery of the database developed in WPs 1-3. To create dynamic, two-dimensional representations of the field of invasion science, we will merge machine learning and natural language processing with symbolic reasoning enabled by the semantic data structures (cf. WP 3; for further information about the approach, see Kraker 2015;Kraker et al. 2015Kraker et al. , 2016. We will then implement a number of data-driven visualizations to provide these representations in an interactive, web-based format.
The visualizations will be based on design concepts for different types of knowledge maps: • A visual search within the Wikidata corpus on invasive species that enables topical overviews • Two variations of the visual search, e.g. a knowledge map for a given hypothesis or a timeline showing the development of research questions over time • A browse view that allows for hierarchical exploration of the whole corpus These design concepts will be refined in collaboration with the research and Wiki community as part of the workshops we will organize in WP4. We will carry out two user tests: 1. Different visualizations will be shown at a workshop where we will discuss these in groups with the participants to gather input for the visualizations.
2. A usability test to evaluate the first iteration of the visualizations will be carried out at a second workshop where we will discuss these in groups to gather feedback for the second iteration.
The data-driven visualizations will be implemented in our award-winning knowledge mapping framework Head Start, and will be made available open source during the development phase.

Call for participation, timeline and outlook
This ambitious project aims to take important steps towards an open and interactive atlas of knowledge, in invasion biology and beyond. If you are interested in contributing to it in one way or another, then please do not hesitate to contact us. We invite contributions by interested individuals and organizations with a focus on invasion science or other disciplines. We have started to think more deeply about applications in restoration and urban ecology as well as in freshwater biodiversity research, and also look forward to collaborations in these and other research fields. Strengthening connections to portals with citizen science data (e.g. iNaturalist) will also be very useful, and initiatives like Wikidata's WikiProject Biodiversity can help with this.
The project outlined here is scheduled to run from September 2021 to February 2024. Beyond this time period, it will be important to continue improving the atlas of knowledge, so that it will thrive and its underlying technology remains state of the art. The sustainability of such online tools is critical, hence we are aiming to secure long-term support for the atlas of knowledge. To reach this goal, we will not only apply for future grants: the sustainability of the atlas will also be supported by its integration with Wikidata right from the beginning, as it is part of the Wikipedia ecosystem that has a strong and sustainable community-based funding model centered around small donations from millions of users each year.