From the course: Introduction to Large Language Models
Unlock the full course today
Join today to access over 24,700 courses taught by industry experts.
Scaling laws
From the course: Introduction to Large Language Models
Scaling laws
- [Instructor] Imagine what things must have been like in the tech space in 2020. The transformer architecture in 2017 was proving to be better than anything before it. This is a time of significant experimentation. Some companies were focusing on the decoder portion, others on the encoder. Others were trying to figure out how they could make the models even better. And it was at this time that the research team at OpenAI suggested that the performance of large models was a function of the number of model parameters, the size of the dataset the models were trained on and the total amount of compute available for training. They performed several experiments on language models to back up their claim. Let's take a look at some of the results. So on the Y axes is the test loss and a lower test loss indicates that the model is performing better. Along the X axes is the number of parameters in the model. So you can see that the…
Contents
-
-
-
-
-
(Locked)
BERT3m 16s
-
(Locked)
Scaling laws3m 30s
-
(Locked)
GPT-37m 41s
-
(Locked)
Chinchilla7m 54s
-
(Locked)
PaLM and PaLM 23m 59s
-
(Locked)
ChatGPT and GPT-45m 47s
-
(Locked)
Open LLMs5m 40s
-
(Locked)
Comparing LLMs3m 35s
-
(Locked)
GitHub Models: Comparing LLMs2m 52s
-
(Locked)
Accessing large language models using an API6m 25s
-
(Locked)
LLM trends4m 6s
-
(Locked)
-