• Starter AI
  • Posts
  • Deep Dive: Building GPT from scratch - part 4

Deep Dive: Building GPT from scratch - part 4

learning from Andrej Karpathy

Hello and welcome back to the series on Starter AI. I’m Miko, writing to you from Kyoto.

Today, we’re picking up makemore, and we’re learning how much we can improve the model by implementing a MLP (following Bengio et al. 2003 MLP language model paper), instead of the bigram approach from last time.

Grab a coffee (or two) - we’re going to need about an hour and a half. This part requires elements from all the previous lectures. If you’re new here, please start at part 1.

The roadmap

The goal of this series is to implement a GPT from scratch, and to actually understand everything needed to do that. We’re following Andrej’s Zero To Hero videos. If you missed a previous part, catch up here:

  1. Neural Networks & Backpropagation part 1 - 2024/02/09

  2. Neural Networks & Backpropagation part 2 - 2024/02/16

  3. Generative language model - bigrams - 2024/02/23

  4. Today: Generative language model - MLP

To follow along, subscribe to the newsletter at starterai.dev. You can also follow me on LinkedIn.

Generative language model - MLP

Today’s lecture is called “Building makemore Part 2: MLP”, and it picks up where we left last time - a working bigrams-based model, offering a rather underwhelming performance. Spoiler alert: things will be getting better now!

Instead of bigrams (looking at the previous character only), we’ll be building a Multilayer perceptron (MLP) that you will recognise from part 2. Andrej will show us how to build it following Bengio et al. 2003 MLP language model paper, how to train it, how to experiment with various knobs at our disposal, and how to evaluate the performance of our model.

Plus, we’ll also see a few important hints on how to use PyTorch to make our lives easier, now that we have learnt how to do the basics from scratch.


Today’s lecture doesn’t introduce an awful lot of new stuff - Andrej’s primarily reusing the knowledge from the previous lectures. Here are a few things that will be useful:

Tensor.view is an efficient way of changing the meaning of the data without allocating new memory.

Andrej\s also referencing this essay about the behemoth that is the PyTorch C++ codebase http://blog.ezyang.com/2019/05/pytorch-internals/. It goes into the weeds to explain how tensors work under the hood (including what makes .view efficient), how the autograd engine works, and how to find your way around the codebase. Well worth a read, even though it’s from 2019.

Finally, in a Marvel-style, post-credits scene, Andrej’s also referencing Google Colab, which hosts Jupyter notebooks and requires no setup. The colab notebook for this lecture is here.

Video + timestamps

00:01:48 Bengio et al. 2003 (MLP language model)

00:12:19 Embedding lookup table

00:18:35 Hidden layer + internals of torch.Tensor: storage, views

00:29:15 Output layer

00:29:53 Loss function

00:32:17 NN overview

00:32:49 PyTorch’s F.cross_entropy

00:37:56 Training loop, overfitting one batch

00:41:25 Mini-batching

00:45:40 How to find a learning rate that works

00:53:20 Splitting dataset into dev, validation and test subsets

01:00:49 Experiment: larger hidden layer

01:05:27 Visualising character embeddings

01:07:16 Experiment: larger embedding size

01:11:46 Summary

01:13:24 Bloopers: sampling from the model

01:14:55 Post-credits: google collab


We’ve designed, implemented, trained and experimented with a language model that’s beginning to produce name-like words that sound vaguely convincing. We’ve reused a bunch of things from the previous lectures, and we’ve seen how to evaluate the model in a somewhat scientific way. Well done, we’re closer to where we want to be!

What’s next

Andrej leaves us with a challenge - play around with all the knobs to get the validation loss below 2.17. Can you manage?

Next time, we’ll stay with the MLP, and spend some time understanding activations and gradients during training. And yes, we’ll make the model better too.

As always, subscribe to this newsletter at starterai.dev to get the next parts in your mailbox!

Share with a friend

If you like this series, please forward to a friend!


How did you like it? Was it easy to follow? What should I change for the next time?

Please reach out on LinkedIn and let me know!

How did you like this issue?

Login or Subscribe to participate in polls.

Subscribe to keep reading

This content is free, but you must be subscribed to Starter AI to continue reading.

Already a subscriber?Sign In.Not now