nanochat is Karpathy’s attempt to strip LLM training down to its bare essentials. It’s about 8,000 lines of code, and it’s designed to be read and understood, not just run. Unlike the big, complicated frameworks you find in production, this one is all about showing you how things work, step by step. It’s the final project for his LLM101n course, and it’s meant to teach, not just impress.
nanochat walks you through every stage of training a language model, from building your own tokenizer (in Rust, no less), to pretraining on a huge pile of text, to teaching the model how to chat, fine-tuning it, and even trying out reinforcement learning if you want. You even get to see how inference works, with all the tricks like KV caching laid bare.
If you run the whole thing on a beefy 8xH100 node, you get to watch your model grow up in real time. After four hours (about $100), you’ve got a model that’s just learning its ABCs. Give it twelve hours ($300), and it’s already outpacing GPT-2 on some tests. Push it to $1,000 (about 42 hours), and suddenly it can solve simple math, write a bit of code, and do better on multiple choice questions. The exact results depend on the architecture, but you get the idea: you see the whole learning curve, not just the end result.
So who is this for? It’s for the people who want to really understand how LLMs are trained, not just press a button and hope for the best. If you’re a researcher, a grad student with access to serious GPUs, or an engineer who wants to see every tradeoff laid out in plain sight, this is for you. And if you’re teaching a course on LLMs, you finally have something you can actually show your students, not just wave your hands and point at black boxes.
But let’s be clear: this isn’t for building the next ChatGPT. If you just want results, you’re better off fine-tuning an existing model with something like Axolotl or Hugging Face Trainer. You’ll get more bang for your buck. You’ll need access to cloud GPUs (those 8xH100 nodes don’t come cheap), and you should know your way around PyTorch and distributed training. What you’re really buying here isn’t a product, but an education.
The real value is in seeing how everything fits together. Most LLM frameworks hide the important decisions behind layers of abstraction, so you never really know why things are built the way they are. With nanochat, every stage is out in the open, in code you can actually read. You can tinker, experiment, and finally understand how chat models work from the ground up, without getting lost in a maze of production code.