Start free trial Sign in

From the course: Introduction to Large Language Models

Unlock the full course today

Join today to access over 24,700 courses taught by industry experts.

Scaling laws

Scaling laws

From the course: Introduction to Large Language Models

Start my 1-month free trial Buy for my team

Scaling laws

“

- [Instructor] Imagine what things must have been like in the tech space in 2020. The transformer architecture in 2017 was proving to be better than anything before it. This is a time of significant experimentation. Some companies were focusing on the decoder portion, others on the encoder. Others were trying to figure out how they could make the models even better. And it was at this time that the research team at OpenAI suggested that the performance of large models was a function of the number of model parameters, the size of the dataset the models were trained on and the total amount of compute available for training. They performed several experiments on language models to back up their claim. Let's take a look at some of the results. So on the Y axes is the test loss and a lower test loss indicates that the model is performing better. Along the X axes is the number of parameters in the model. So you can see that the…

Contents