• Starter AI
  • Posts
  • Deep Dive: Building GPT from scratch - part 2

Deep Dive: Building GPT from scratch - part 2

learning from Andrej Karpathy

Hello and welcome back to the Deep Dive into GPTs series on Starter AI. It’s Miko again, and today we’re continuing learning from Andrej Karpathy.

Last week, in part 1 of the series, we learned what a neural network was, and we followed Andrej’s explanation to calculate the gradients by hand. Today, we’re building up on that to finish up the implementation of micrograd. Buckle up, it’s going to be super fun!

Also, did you hear that Andrej quit OpenAI (again!) to work on some of his own projects? Looking forward to seeing what comes out of it!

Andrej’s tweet on Valentine’s day

Good luck to you, Andrej, and for us, let’s get learning!

The roadmap

The goal of this series is to implement a GPT from scratch, and to actually understand everything needed to do that. We’re following Andrej’s Zero To Hero videos, taking detours for software engineers without any AI exposure to understand some concepts he skips over. If you missed a previous part, you can catch up here:

  1. Neural Networks & Backpropagation part 1 - 2024/02/09

  2. Today: Neural Networks & Backpropagation part 2

To follow along, subscribe to the newsletter at starterai.dev. You can also follow me on LinkedIn.

Neural Networks & Backpropagation - part 2

Today we’re finishing Andrej’s lecture called “The spelled-out intro to neural networks and backpropagation: building micrograd”.

Last time we saw what a neural network actually is, and we watched Andrej manually calculate the gradients for backpropagation. Today, we'll be implementing the actual guts of micrograd.

Before we jump into the lecture, here’s some extra context for the video.

Context

Directed acyclic graph (DAG). The type of graph we’re using to represent the calculations.

Topological sort. Ensures an order in which all the children of a node are processed before the node itself. Wiki has pseudo code for a couple of algos.

Tanh (hyperbolic tangent function). Wolfram. Used to dampen the value to stay within a range.

Multilayer perceptron (MLP). Scary-sounding name for a neural network of fully connected neurons, with at least 3 layers (input, hidden, output) and a non-linear activation function.

Stochastic gradient descent (SGD). Python algo. Inexact, but practical way of finding model parameters to best fit the data.

Cross entropy loss function. What a ‘real’ neural network might use for the loss function.

Video + timestamps

01:09:02 Implementing the backward propagation for the DAG

01:22:28 Handling nodes that are accessed multiple times in the DAG

01:27:05 Implementing other operations, like minus, exp, divide

01:39:31 Comparison with PyTorch - spoiler: it works!

01:43:55 Building neural nets in micrograd

01:51:04 What’s a loss function in ML?

01:57:56 What are the parameters in a neural network?

02:01:12 Training the network manually

02:10:23 Andrej being awesome, explaining a bug

02:14:03 Differences against ‘real’ systems

02:16:46 Walkthrough of the final micrograd code on Github + unit test vs PyTorch + more complicated demo notebook

02:21:10 Looking into the implementation of PyTorch, the real, production code

Summary

Today, we learned how to implement backpropagation with more mathematical operations, found out that micrograd’s API looks suspiciously like PyTorch, and that the results for the example DAG match the ones from PyTorch.

Then we moved on to create an actual MLP, and then trained on a very simple problem. Andrej discussed some of the differences, but a “real” system would work fundamentally the same.

What’s next

Next time we’ll be starting on makemore: a generative neural network, predicting next chunks of text. The next step towards our GPT!

Next episodes will be sent through the newsletter at starterai.dev, so subscribe if you haven’t already.

Share with a friend

If you like this series, please forward to a friend!

Feedback

How did you like it? Was it easy to follow? What should I change for the next time?

How did you like this issue?

Login or Subscribe to participate in polls.

Please reach out on LinkedIn and let me know!

Subscribe to keep reading

This content is free, but you must be subscribed to Starter AI to continue reading.

Already a subscriber?Sign In.Not now