KoMT-Bench, a benchmark designed to evaluate the capabili... | KoMT-Bench, a benchmark designed to evaluate the capabili...
KoMT-Bench, a benchmark designed to evaluate the capability of language models in following instructions in Korean. KoMT-Bench is an in-house dataset created by translating MT-Bench [1] dataset into Korean and modifying some questions to reflect the characteristics and cultural nuances of the Korean language. After the initial translation and modification, we requested expert linguists to conduct a thorough review of our benchmark dataset.

To conduct evaluations on KoMT-Bench, please visit the official KoMT-Bench GitHub repository in which the evaluation scripts are provided.