• Starter AI
  • Posts
  • Deep Dive: Building GPT from scratch - part 7

Deep Dive: Building GPT from scratch - part 7

learning from Andrej Karpathy

Sponsored by

Hello and welcome back to the series on Starter AI. It’s Miko again, from vaguely spring-like London.

Today, we’re back from the side quests, and we’re finally finishing up makemore. We’ll complicate the architecture and then  have an epiphany that the resulting code looks very much like WaveNet from the 2016 DeepMind paper. The lecture is shorter than usual, at just under an hour. Let’s do it!


Web Intelligence, Unlocked

With Bright Data's cutting-edge proxy solutions, harness the full potential of web data for your business. Tap into our global proxy network to scale your data collection activities. Ecommerce platforms, travel agencies, financial institutions, and market researchers are all leveraging web data to gain a competitive edge.

Bright Data offers the scalability and flexibility necessary for gathering and analyzing web data. Take the first step towards data-driven excellence.

The roadmap

The goal of this series is to implement a GPT from scratch, and to actually understand everything needed to do that. We’re following Andrej’s Zero To Hero videos. If you missed a previous part, catch up here:

To follow along, subscribe to the newsletter at starterai.dev. You can also follow me on LinkedIn.

Generative language model - WaveNet

Today’s lecture is called “Building makemore Part 5: Building a WaveNet” and it’s the final helping in our adventure implementing makemore. The goal today is to change the architecture of the neural network to include more layers, change the context size, and see whether that improves the model’s performance.

We also get a little insider info on what the actual development looks like for Andrej - the test harness, notebooks vs vscode, testing out different hyperparameters, and more.

The lecture concludes with Andrej giving a brief overview of what a convolutional neural network looks like, and how it’s applied as a way of improving efficiency in the paper. The actual implementation is left for a future video.


WaveNet paper is the inspiration for today’s changes. It proposes a deep neural network for generating raw audio, and was published in 2016 by a team of researchers at Google DeepMind. We’re only really going to change the layers to look more like the ones in the paper:

Figure 2 from the WaveNet paper (Google DeepMind, 2016)

Video + timestamps

00:01:40 Pick up the code where we left it before the side quests

00:06:56 Beautify the learning rate plot

00:09:16 pytorchifying our code: layers, containers, torch.nn, fun bugs

00:17:11 WaveNet paper overview

00:19:33 Context size increase to 8

00:21:36 implementing WaveNet

00:38:50 fixing batchnorm1d bug

00:45:21 re-training WaveNet with bug fix

00:46:07 scaling up our WaveNet

00:47:44 WaveNet but with “dilated causal convolutions”

00:51:34 torch.nn

00:52:28 the development process of building deep neural nets

00:54:17 going forward


Today we’ve seen Andrej guide us through implementing a deeper neural network, similar to the one described in the WaveNet paper, and complete the implementation to be able to deal with various sizes of things, in particular in the batch normalization layer. We also saw the code become ever more closely aligned with PyTorch API (torch.nn), with a couple of notable departures.

We now have all the building blocks to get cracking on the original goal of the series - the Generative Pretrained Transformers!

Lets Go Yes GIF by Pudgy Penguins

Gif by pudgypenguins on Giphy

What’s next

Next time we’ll be using all the knowledge we’ve acquired so far to work on an actual GPT. Finally!

As always, subscribe to this newsletter at starterai.dev to get the next parts in your mailbox!

Share with a friend

If you like this series, please forward to a friend!


How did you like it? Was it easy to follow? What should I change for the next time?

Please reach out on LinkedIn and let me know!

How did you like this issue?

Login or Subscribe to participate in polls.

Subscribe to keep reading

This content is free, but you must be subscribed to Starter AI to continue reading.

Already a subscriber?Sign In.Not now