LLM Optimization Techniques
The world of transformer-based architectures is in a race to develop the largest and most capable models at breakneck speed. Models like GPT-2, once considered so large and advanced that they were seen as potentially harmful if released widely1, 2, are now viewed as small by today’s standards, where models run into billions of parameters. Research teams at OpenAI, Google, and others have consistently delivered increasingly powerful models, driven by the idea that “larger models are better.”3, 4 But was it really just about scale all along?
With models now consuming vast amounts of internet-scale data and requiring enormous hardware resources, what lies ahead? Concerns around environmental impact, affordability, and accessibility are leading researchers to explore more efficient, optimized ways to achieve similar—or even better—performance.
In this chapter, we will cover:
- Motivations behind the need to...