Advances in model development
As we’ve seen, LLMs have emerged based on the fundamental transformer architecture.1 Those foundational models are trained to predict the next token in a sequence or a masked token within the prompt.2 Afterward, these foundational models can be augmented with instruction or chat-based fine-tuning,3,4 which builds on the model’s ability to replicate language through supervised training that targets particular objectives or turn-based dialogue. These supervised objectives can be enhanced with reinforcement learning via Reinforcement Learning with Human Feedback (RLHF),5 where the model learns a reward function based on human-annotated preferences.
Improving these basic ingredients is an area of active research, both in model training and in their architecture.
As we’ve covered, the training of an LLM can be principally divided into the foundational pretrained text generation phase and the fine-tuning phase. Below we discuss...