Why optimize?
The chapters so far have shown that training large, billion-parameter models is far more complex than just importing a few libraries and pressing Run. Building and utilizing these large models demands a series of precise steps that go beyond data science and deep learning—it requires substantial engineering effort. But the challenges don’t end there.
Training large models involves intensive manual work to curate datasets, the setup of training infrastructure with servers powered by thousands of GPUs, and a significant amount of electricity 5, 6! For instance, Google’s PaLM reportedly cost around USD 27 million in training expenses alone:

Figure 9.1: A tweet on X.com discussing the estimated cost of training LLMs like LLaMA and PaLM (source: X.com7)
To get a better idea of how costly it can be to train an LLM, let’s walk through a back-of-the-envelope calculation.
These costs are purely for education purposes and...