Summary
In this chapter, we introduced some of the core ideas that have dominated recent models for NLP, like the attention mechanism, contextual embeddings, and self-attention. We then used this foundation to learn about the transformer architecture and its internal components. We presented an overview of different transformer-based architecture families. We then briefly discussed BERT and its family of architectures. We covered three different NLP tasks and explored how the performance of pretrained versus fine-tuned models differs. In the next section of the chapter, we presented a discussion on the decoder-only transformer language models from OpenAI. We covered the architectural and dataset-related choices for GPT and GPT-2. We leveraged the transformer
package from Hugging Face to develop our own GPT-2-based text generation pipeline. Finally, we closed the chapter with a brief discussion on GPT-3. We discussed various motivations behind developing such a huge model and its long...