• Starter AI
  • Posts
  • Deep Dive: Building GPT from scratch - part 4

Deep Dive: Building GPT from scratch - part 4

learning from Andrej Karpathy

Hello and welcome back to the series on Starter AI. I’m Miko, writing to you from Kyoto.

Today, we’re picking up makemore, and we’re learning how much we can improve the model by implementing a MLP (following Bengio et al. 2003 MLP language model paper), instead of the bigram approach from last time.

Grab a coffee (or two) - we’re going to need about an hour and a half. This part requires elements from all the previous lectures. If you’re new here, please start at part 1.

The roadmap

The goal of this series is to implement a GPT from scratch, and to actually understand everything needed to do that. We’re following Andrej’s Zero To Hero videos. If you missed a previous part, catch up here:

  1. Neural Networks & Backpropagation part 1 - 2024/02/09

  2. Neural Networks & Backpropagation part 2 - 2024/02/16

  3. Generative language model - bigrams - 2024/02/23

  4. Today: Generative language model - MLP

To follow along, subscribe to the newsletter at starterai.dev. You can also follow me on LinkedIn.

Generative language model - MLP

Today’s lecture is called “Building makemore Part 2: MLP”, and it picks up where we left last time - a working bigrams-based model, offering a rather underwhelming performance. Spoiler alert: things will be getting better now!

Subscribe to keep reading

This content is free, but you must be subscribed to Starter AI to continue reading.

I consent to receive newsletters via email. Terms of Use and Privacy Policy.

Already a subscriber?Sign In.Not now