"FAIR-by-Design" Artifacts: Enriching Publications and Software with FAIR Scientific Information at the Time of Creation Slideshow uri icon

abstract

  • Presentation on the idea of "FAIR-by-Design" Artifacts at the NFDI4Ing Conference 2023. Abstract In several research disciplines, the use and development of software have become an integral part, with researchers reporting in publications the results obtained with software and concepts implemented in software. Consequently, publications and software have become two core artifacts in academia with increasing importance for measuring research impact and reputation. The research community has made great efforts to improve digital access to publications and software. However, even now that these artifacts are available in digital form, researchers still encapsulate the scientific information in static and relatively unstructured documents unsuitable for communication. The next step in the digital transformation of scholarly communication requires a more flexible, fine-grained, context-sensitive, and semantic representation of scientific information to be understandable, processable, and usable by humans and machines. Researchers need support in the form of infrastructures, services, and tools to organize FAIR scientific information from publications and software. Several research disciplines work on initiatives to organize scientific information, e.g., machine learning with “Papers-with-Code”, invasion biology with “Hi-Knowledge”, and biodiversity with “OpenBiodiv”. However, these initiatives are often technically diverse and limited to the respective application domain. For this reason, we from the task area Ellen of NFDI4Ing (and in collaboration with NFDI4DataScience and NFDI4Energy) decided to use the Open Research Knowledge Graph (ORKG), an innovative infrastructure for organizing scientific information from publications and software. The ORKG is a cross-discipline research knowledge graph that offers all research communities an easy-to-use and sustainably governed infrastructure. This infrastructure implements best practices, such as FAIR principles and versioning, with services combining manual crowd-sourcing and (semi-)automated approaches to support researchers in producing, curating, processing, and (re-)using FAIR scientific information from publications and software. As a result, organized scientific information is openly available in the long term and can be understood, processed, and used by humans and machines. Thus, research communities can constantly build, publish, maintain, (re-)use, update, and expand organized scientific information in a long-term and collaborative manner. While the ORKG currently focuses on organizing scientific information from published publications and software, we aim to help researchers create “FAIR-by-Design” artifacts to improve their storage, access, and (re-)use, using the ORKG as exemplary infrastructure. The idea of “FAIR-by-Design” artifacts is that the creators of an artifact describe it with extensive and FAIR information once and in parallel to the time of creation. This FAIR information is embedded directly into the artifact to be available to anyone at any time. Specifically, we developed two tools (SciKGTeX for publications and DataDesc for software) that support researchers in the role of author and developer to enrich their publications and software at the time of writing and development with FAIR scientific information embedded into the respective artifact. SciKGTeX is a LaTeX package to annotate research contributions directly in LaTeX source code. Authors can enrich their publications with structured, machine-actionable, and FAIR scientific information about their research contributions. SciKGTeX embeds the annotated contribution data into the PDF’s XMP metadata so that the FAIR scientific information persists for the lifetime of the artifact. DataDesc is a toolkit that combines different tools to describe software with machine-actionable metadata. Developers can describe Python software and its interfaces with extensive metadata by annotating individual classes and functions directly within the source code. DataDesc converts all metadata into an OpenAPI-compliant YAML file, which various tools can render and process. Regarding the research data management (RDM) lifecycle, both tools target the production phase to support researchers in creating“FAIR-by-Design” artifacts. Creating “FAIR-by-Design” artifacts helps to improve their storage, leading to better access to artifacts and thus laying the foundation for their effective (re-)use. Using the ORKG as exemplary infrastructure, we demonstrate with two proof-of-concepts how infrastructure providers can use the artifacts from SciKGTeX and DataDesc to store the FAIR scientific information in their systems. In the case of SciKGTeX, the ORKG recently added a new upload feature for SciKGTeX annotated PDFs to allow researchers to add the FAIR scientific information of publications quickly and easily. In addition, the ing.grid journal provides a version of their LaTeX template that integrates the SciKGTeX. For DataDesc, we plan such an upload feature and similar use by the community in future work. Researchers only need to create a “FAIR-by-Design” artifact once, and can reuse it on multiple infrastructures to improve their dissemination and discoverability. With improved storage, researchers can more easily discover and access publications and software to determine whether an artifact fulfills their information needs. However, researchers do not have to rely on such infrastructures to find, access, and assess publications or software. When they encounter a “FAIR-by-Design” artifact, it embeds the additional information itself so that they can review the artifact themselves with the same information base. Improved discoverability and accessibility lay the foundation for effective (re-)use as researchers can better understand an artifact. In the case of the ORKG, we can even (re-)use the information from SciKGTeX and DataDesc stored in the ORKG interchangeably. A publication annotated with SciKGTeX can reference a software annotated with DataDesc stored in the ORKG and vice versa. Overall, enabling researchers to create “FAIR-by-Design” artifacts is a promising approach to support the downstream phases of storage, access, and (re-)use in the RDM lifecycle. In our presentation, we want to explain the idea of “FAIR-by-Design” artifacts in more detail using concrete examples based on the two tools and in combination with the ORKG. We believe that the idea of “FAIR-by-Design” artifacts is of interest to the research community. The two tools can inspire other researchers to extend our original approaches and develop new ones to create more “FAIR-by-Design” artifacts by enriching artifacts with FAIR scientific knowledge at the time of creation. Furthermore, we hope to encourage and motivate researchers to use our tools more intensively and thus establish them. In particular, the existing and planned future integration with ORKG and the existing collaboration with the ing.grid journal are motivating incentives for researchers to use SciKGTeX and DataDesc actively.

authors

publication date

  • 2023