About · Transformer Explainer

This is a graphical, interactive explainer of the Transformer decoder. Type your own tokens, watch every operation execute, and sample the next token end-to-end. Every matrix you see is the actual value the server computed — the visualisations are not a separate model.

It exists for two readers. End users learning the Transformer get a step-by-step walk-through with three layers of depth (concept, maths, code) they can toggle inside any section. Source-readers get a real-world Next.js full-stack codebase that reads like a textbook, with shapes, formulas, and references in the JSDoc of every operation.

References

Vaswani et al., 2017 — Attention Is All You Need. The original Transformer paper. The pre-norm decoder block, sinusoidal positional encoding, and scaled dot-product attention in this codebase all follow this paper directly.

Radford et al., 2019 — Language Models are Unsupervised Multitask Learners (GPT-2). The decoder-only architecture and pre-LayerNorm pattern follow this work.

Hendrycks & Gimpel, 2016 — Gaussian Error Linear Units (GELUs). The tanh approximation used in `lib/transformer/gelu.ts`.

Andrej Karpathy — nanoGPT and “Let's build GPT”. Reference implementations checked against during development.

3blue1brown — Neural Networks series. Pedagogical inspiration for the visual intuition layer.

About this project

Built with

References

Licence