Summary
This chapter presented the key concepts that have proven to be pivotal for the whole language modeling paradigm. We started by going through a recap of the transformer architecture and the typical way to pretrain a large model, followed by fine-tuning for specific tasks. We also touched upon the aspects of limitations of such models in terms of alignment with tasks. The chapter then progressed to provide an overview of an extended training setup involving additional steps of instruction tuning, followed by RLHF to improve not just the alignment but the overall model performance as well. The following sections provided a detailed commentary on each of the topics, along with hands-on exercises to instruction-tune a GPT-2 model to translate English news headlines to German, and a PPO-aligned GPT-2 model to generate mostly positive movie reviews.
The chapter closed by providing a brief discussion of how this extended training setup kick-started the era of LLMs and a sneak...