Text generation with GPT
OpenAI has been in the spotlight for quite some time because of its newsworthy works, such as GPT8, GPT-29, and 3 (and also instructGPT, 3.5, and 4, along with viral sensation ChatGPT, but these are a bit different and covered in subsequent chapters). In this section, we will briefly discuss GPT architectures up to GPT-3. We will then use a pretrained version of GPT-2 for our text generation task.
Generative re-training: GPT
The first model in this series is called GPT, or Generative Pretraining. It was released in 2018, about the same time as BERT. The paper presents a task-agnostic architecture based on the ideas of transformers and unsupervised learning. The GPT model was shown to beat several benchmarks, such as GLUE and SST-216, although its performance was overtaken by BERT, which was released shortly after this.
GPT is essentially a language model based on the transformer-decoder we presented previously. Since a language model can be trained...