This paper presents a benchmark, DebugEval, which is used to evaluate the code debugging ability of LLMs (Large Language Models) and proposals a framework for synthesizing training data using multiple agents, COAST.
DebugEval designs four task scenarios: BUG Localization, BUG Identification, Code Repair, and Code Recognition to comprehensively evaluate the code debugging capability of LLMs.
COAST is a framework for making use of multiple agents working together to synthesize training data to improve code debugging capability of LLMs.
You can clone the repository using the following command:
git clone https://github.com/NEUIR/COAST
cd COAST
Download the dataset we provide.
cd src
Please refer to src/README.md
for more details.
We use DeepSeek-Coder-6.7B-Ins and Llama3-8B-Ins as the base model, and train the models with COAST framework.
cd neural_compiler
Please refer to neural_compiler/README.md
for more details.
cd LLaMA-Factory
Please refer to LLaMA-Factory/README.md
for more details.
We provide the trained NeuDebugger models.
Please cite the paper and star the repo if you use DebugEval and find it helpful.
Feel free to contact 2301983@stu.neu.edu.cn or open an issue if you have any questions.
@misc{yang2025coastenhancingcodedebugging,
title={COAST: Enhancing the Code Debugging Ability of LLMs through Communicative Agent Based Data Synthesis},
author={Weiqing Yang and Hanbin Wang and Zhenghao Liu and Xinze Li and Yukun Yan and Shuo Wang and Yu Gu and Minghe Yu and Zhiyuan Liu and Ge Yu},
year={2025},
eprint={2408.05006},
archivePrefix={arXiv},
primaryClass={cs.SE},
url={https://arxiv.org/abs/2408.05006},
}