• Starter AI
  • Posts
  • Deep Dive: Building GPT from scratch - part 8

Deep Dive: Building GPT from scratch - part 8

learning from Andrej Karpathy

Hello and welcome back to this deep dive series on Starter AI!

Today, we’re finally jumping into the Generative Pretrained Transformer (GPT) to Gen AI infinite Shakespear on demand, re-implementing nanoGPT. You’ll be glad to see all the new skills from the previous 7 parts come into play now!

Due to popular demand, we’re breaking this lecture into two, one-hour sessions. Enjoy!

The roadmap

The goal of this series is to implement a GPT from scratch, and to understand everything needed to do that. We’re following Andrej’s Zero To Hero videos. Catch up here:

To follow along, subscribe to the newsletter at starterai.dev. You can also follow me on LinkedIn.

GPT - groundwork for tiny Shakespeare

Today’s lecture is called “Let's build GPT: from scratch, in code, spelled out.”, and we’re tackling the first hour.

We’re building a small GPT that we’ll train on Shakespeare’s works (a toy dataset under 1mb, so about a million characters), that will end up re-implementing nanoGPT, which in turn is a rewrite of minGPT.

NanoGPT reproduces GPT-2 124M on OpenWebText (the actual OpenAI training set is not open, so this is best-effort reproduction of the dataset), and also allows you to load the official OpenAI’s weights. All that in about 300 lines of PyTorch for the model, and another 300 lines for the boilerplate training loop. Easily one of the coolest repos on github at the moment.

Subscribe to keep reading

This content is free, but you must be subscribed to Starter AI to continue reading.

I consent to receive newsletters via email. Terms of Use and Privacy Policy.

Already a subscriber?Sign In.Not now