WikiRAG-TR is a dataset of 6K (5999) question and answer... | WikiRAG-TR is a dataset of 6K (5999) question and answer...
WikiRAG-TR is a dataset of 6K (5999) question and answer pairs which synthetically created from introduction part of Turkish Wikipedia Articles. The dataset is created to be used for Turkish Retrieval-Augmented Generation (RAG) tasks.

Dataset Information
Number of Instances: 5999 (5725 synthetically generated question-answer pairs, 274 augmented negative samples)
Dataset Size: 20.5 MB
Language: Turkish
Dataset License: apache-2.0
Dataset Category: Text2Text Generation
Dataset Domain: STEM and Social Sciences
WikiRAG-TR Pipeline
The creation of the dataset was accomplished in two main phases, each represented by a separate diagram.