/learn

The decoder, step by step

Read in order — each section builds on the one before it. Toggle layers (Concept / Maths / Code) inside any section to control how deep the explanation goes.

01Overview
What a decoder does. Tokens in, logits out.
02Embeddings
Token lookup + sinusoidal positional encoding.
03Attention
Q/K/V, scores, mask, softmax, output. The headline act.
04Feed-forward
Position-wise W₂·GELU(W₁·x + b) + b.
05LayerNorm & residuals
Why residuals; why LayerNorm; pre-norm vs post-norm.
06Stacking blocks
What changes layer-to-layer.
07Sampling
Logits → next token. Greedy, temperature, top-k, top-p.