Build a LLM from scratch - Sebastian Raschka

Victor Hugo Germano
May 2
2 min read

Build a Large Language Model - Sebastian Raschka

In the last 2 weeks I dedicated myself to the book Build A Large Language Model , by Sebastian Raschka. I set myself the goal of completing the implementation of the book so that I could deepen my understanding of the mechanisms behind LLMs, and thus have more authority to technologically criticize what I have been talking about here for a long time. Since I was on Spring Break from my master's degree, it was a great pastime!

Sebastian Raschka managed to produce a great tutorial on the implementation of Transformers and Attention Mechanism that is well worth exploring! I really enjoyed the book. I took the opportunity to share my notes and the code I generated using the book on Github, for anyone interested in learning more about the technical side of probabilistic text generators.

I believe this is a very important concept for the current moment - although I believe I am a little late on the subject, since we are reaching the pit of disillusionment with the capacity of LLMs. And there is increasing evidence that this model will not advance without putting the entire planet at risk of catastrophe . Purely focusing on the scale of computing power for AI is a path with no real future: not only for technology, but for the planet.

Yann LeCun believes LLMs are already obsolete

I was really impressed by the concept of Attention and the processing of data to compose a suitable dataset for training. Quite ingenious! And the knowledge of matrix processing helps a lot to deepen the understanding.

The book is super detailed and the author's own repository has several different references and solutions. I recommend it!

Despite this, I found myself countless times using the material created by Grant Sanderson , to facilitate the visualization of my own mental model of how LLMs work, and how the Attention process takes place.

The visualization of concepts he achieved in the video series is simply phenomenal:

If you want, there is a 1-hour talk where he presents the concepts in more depth: Visualizing transformers and attention | Talk for TNG Big Tech Day '24

It took me two weeks to go through the book, implement the examples and solve some problems that arise. PyTorch is a great tool that simplifies the work a lot!

I made all my learning available in a project on GitHub, which may be useful for anyone who wants to know more about the terms and implementation details:

LLM deep dive - https://github.com/victorhg

It's great to be back to programming in Python :D