abstract
- Managing Research Digital Objects (RDOs)—such as datasets, publications, and software—according to the FAIR principles (findable, accessible, interoperable, and reusable) [1] is fundamental to ensuring transparency, reproducibility, and accessibility in modern science [2]. However, existing research data management (RDM) systems face persistent challenges, including metadata heterogeneity, limited contextualization, and difficulties in integrating distributed knowledge sources. These barriers impede scientific discovery and hinder the effective reuse of research data across disciplines. The Leibniz Data Manager (LDM) [3] addresses these challenges by enhancing the machine-actionability of RDO collections through Semantic Web technologies and the integration of Knowledge Graphs (KGs) with Neuro-Symbolic AI [4]. Designed and maintained by the Technische Informationsbibliothek (TIB), LDM currently manages information about more than 216,900 datasets spanning diverse research fields. It enables researchers to connect datasets to external knowledge bases, such as Wikidata and the Open Research Knowledge Graph (ORKG) [5], and facilitates federated query processing across the interlinked KGs [6]. A natural language interface further allows users to explore federated scientific KGs interactively, improving accessibility and supporting users across different expertise levels. LDM incorporates key features including metadata representation using standardized vocabularies (e.g., DCAT, DataCite), entity linking for metadata enrichment, and dynamic KG updates to ensure that research metadata remains current and trustworthy. It also provides advanced visualization and dataset comparison tools that help researchers evaluate and contrast datasets based on shared attributes. Live code execution via integrated Jupyter Notebooks supports interactive and reproducible data analysis workflows, fostering active research engagement. A distinctive feature of LDM is its use of Neuro-Symbolic AI to support query processing and metadata enhancement. By combining symbolic reasoning with Large Language Models (LLMs), LDM enables federated queries across multiple Knowledge Graphs, offering cross-domain scientific discovery capabilities. This integration facilitates the contextualization of research data, allowing users to retrieve relevant information from interconnected sources efficiently. The system architecture is organized into two primary components: (A) RDO Management, which covers metadata annotation, entity linking, and KG updates; and (B) RDO Exploration and Visualization, offering dataset comparison, federated search and exploration, and interactive tools for exploratory data analysis. A semantic model ensures consistency and interoperability across all knowledge representations. LDM's capabilities are shown through four use cases: (1) RDO Exploration, where researchers retrieve scientific resources via natural language queries over federated KGs; (2) Enhancing RDO Understanding, where side-by-side dataset comparisons provide deeper insight into similarities and differences; (3) Enhancing RDO Data, where external knowledge sources are linked to datasets for enriched metadata representation; and (4) Extracting Data from Textual Articles, where symbolic reasoning combined with LLMs enables automatic metadata generation from scientific text. LDM contributes to more efficient, transparent, and reproducible scientific research by bridging the gap between static metadata and dynamic, interconnected knowledge environments. Future developments will focus on enhancing automation in metadata generation, strengthening integration with NFDI consortia, improving multilingual access, and extending support for domain-specific ontologies through the TIB Terminology Service, such as UMLS1 for biomedical metadata or the Open Energy Ontology2 for energy metadata.