GAIL Seminar Series: The industrialization of Large Language Model training

BigScience, an open-science initiative, demonstrated the feasibility of creating a 176-billion-parameter Large Language Model (LLM). It served as a proof of concept that such ambitious undertakings were possible. Yet, two and a half years later, only a handful of 100-billion+ parameter models exist, despite significant interest from developers, researchers, investors, and governments.

In his GAIL Seminar, Dr Matthias Gallé from Cohere will argue that LLM training is undergoing a process of "industrialization," where developing cutting edge models demands not only specialized knowledge but also a robust organizational ecosystem. Contrary to initial assumptions, this evolution hinges on large, coordinated teams with dedicated resources.

This industrialization is reshaping the landscape of AI research and development, unlocking novel challenges and opportunities. Dr Gallé  will illustrate one such emerging research avenue — model merging — highlighting its potential to substantially improve the models in this new reality.