Let’s build GPT: from scratch, in code, spelled out.

We build a Generatively Pretrained Transformer (GPT), following the paper "Attention is All You Need" and OpenAI's GPT-2 / GPT-3. We talk about connections to ChatGPT, which has taken the world by storm. We watch GitHub Copilot, itself a GPT, help us write a GPT (meta :D!) . I recommend people watch the earlier makemore videos to get comfortable with the autoregressive language modeling framework and basics of tensors and PyTorch nn, which we take for granted in this video.

Links: - Google colab for the video: https://colab.research.google.com/drive/1JMLa53HDuA-i7ZBmqV7ZnA3c_fvtXnx-?usp=sharing - GitHub repo for the video: https://github.com/karpathy/ng-video-lecture - Playlist of the whole Zero to Hero series so far: https://www.youtube.com/watch?v=VMj-3S1tku0&list=PLAqhIrjkxbuWI23v9cThsA9GvCAUhRvKZ - nanoGPT repo: https://github.com/karpathy/nanoGPT - my website: https://karpathy.ai - my twitter: https://twitter.com/karpathy - our Discord channel: https://discord.gg/3zy8kqD9Cp

Supplementary links: - Attention is All You Need paper: https://arxiv.org/abs/1706.03762 - OpenAI GPT-3 paper: https://arxiv.org/abs/2005.14165 - OpenAI ChatGPT blog post: https://openai.com/blog/chatgpt/ - The GPU I'm training the model on is from Lambda GPU Cloud, I think the best and easiest way to spin up an on-demand …

Original video: https://www.youtube.com/watch?v=kCc8FmEb1nY
Downloaded by http://huffduff-video.snarfed.org/ on Fri Feb 10 09:16:10 2023 Available for 30 days after download