SciD-QuESt: From Scientific Documents to Knowledge – Questionnaire-Based Extraction and Structuring of Knowledge in the Open Research Knowledge Graph with LLMs and Human Validation
Structuring knowledge as a side quest in the research journey.
Co-Applicant: |
Dr. Oliver Karras |
![]() |
My role: |
Principal investigator |
|
Funding: |
€35,000.00 from NFDIxCS Flexfund |
|
Duration: |
01.2026 – 07.2026 |
The goal of SciD-QuESt is to empower researchers to transform scientific documents into structured, reusable knowledge — without requiring expertise in semantic technologies. By combining intuitive, questionnaire-based workflows with the power of large language models (LLMs) and human validation, the project lowers the barrier to contributing high-quality, FAIR metadata to infrastructures such as the Open Research Knowledge Graph (ORKG).
SciD-QuESt addresses the current disconnect between manual knowledge extraction (e.g., in literature reviews or artifact evaluations) and machine-readable, semantically rich representations. It offers a semi-automated, researcher-friendly approach that integrates seamlessly into Open Science workflows and supports reproducibility, transparency, and comparability across disciplines.
Core Components
- Bidirectional transformation: Design a bidirectional transformation between questionnaires and ORKG templates, enabling researchers to start from either intuitive question sets or existing semantic structures — and convert between them as needed.
- LLM-powered extraction: Implement an LLM-powered extraction pipeline that processes scientific PDFs and suggests answers to user-defined questions, linking each suggestion to its source text as evidence.
- Human-in-the-Loop validation: Develop a Human-in-the-Loop validation interface where researchers can review, accept, or refine the LLM’s suggestions, ensuring quality and trustworthiness before exporting the structured data to JSON or importing it directly into the ORKG.
By weaving together natural language interfaces, AI-assisted extraction, and semantic integration, SciD-QuESt delivers a flexible, extensible MVP that can be embedded into platforms, such as the NFDIxCS portal. It supports use cases such as metadata enrichment, structured artifact reporting, and Open Science competitions — helping researchers, chairs, and reviewers extract key insights from submissions with greater consistency and efficiency.
This approach ensures that researchers can contribute structured knowledge without needing to learn RDF, ontologies, or SPARQL. Instead, they engage through familiar formats — questionnaires — and receive AI support with transparent, editable outputs. The result is a scalable, FAIR-compliant workflow for knowledge curation that is both accessible and rigorous.
