Skip to content

a survey of long-context LLMs from four perspectives: architecture, infrastructure, training, and evaluation

Notifications You must be signed in to change notification settings

OpenMOSS/Thus-Spake-Long-Context-LLM

Repository files navigation

Thus Spake Long-Context Large Language Model

阅读中文版本

This repository provides a collection of papers and resources focused on long-context LLMs, including architecture, infrastructure, training, and evaluation. For a clear taxonomy and more insights about the methodology, you can refer to our survey: Thus Spake Long-Context Large Language Model. In this survey, we give a global picture of the lifecycle of long-context LLMs from four perspectives: architecture, infrastructure, training, and evaluation, including length extrapolation, cache optimization, memory management, architecture innovation, training infrastructure, inference infrastructure, long-context pre-training, long-context post-training, long-context MLLM (mainly long VideoLLM), and long-context evaluation, showcasing the full spectrum of long-context technologies. At the end of this survey, we present 10 unanswered questions currently faced by long-context LLMs.

We appreciate any useful suggestions for improvement of this paper list or survey from peers and commit to regularly updating the repository.

If you would like to include your paper or any modifications in this survey and repository, please feel free to raise issues or send an email to xrliu24@m.fudan.edu.cn. We sincerely appreciate your collaboration!

If you find our survey useful for your research, please consider citing the following paper:

@misc{liu2025spakelongcontextlargelanguage,
      title={Thus Spake Long-Context Large Language Model}, 
      author={Xiaoran Liu and Ruixiao Li and Mianqiu Huang and Zhigeng Liu and Yuerong Song and Qipeng Guo and Siyang He and Qiqi Wang and Linlin Li and Qun Liu and Yaqian Zhou and Xuanjing Huang and Xipeng Qiu},
      year={2025},
      eprint={2502.17129},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.17129}, 
}

News

  • [2025.03.12] 🎉🎉🎉 We collect papers and blogs mentioned in the survey.
  • [2025.02.27] 🎉🎉🎉 We release a introducing video about our survey on bilibili.
  • [2025.02.26] 🎉🎉🎉 We release the report ppt on github.
  • [2025.02.25] 🎉🎉🎉 Our paper reveices the #1 paper of the day on huggingface.
  • [2025.02.24] 🎉🎉🎉 We release the first version of the paper on arXiv and huggingface.
  • [2025.01.29] 🎉🎉🎉 We release the first version of the paper on github.

Table of Contents

Paper List

Survey & Technical Report

Architecture

Length Extrapolation

Kv Cache Optimization

Memory Management

Architecture Innovation

Infrastructure

Training Infrastructure

Inference Infrastructure

Training

Long-Context Pre-Training

Long-Context Post-Training

Long-Context Mllm

Evaluation

Long-Context Evaluation

Unanswered Question

About

a survey of long-context LLMs from four perspectives: architecture, infrastructure, training, and evaluation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published