transformer-explainer

/learn

The decoder, step by step

Read in order — each section builds on the one before it. Toggle layers (Concept / Maths / Code) inside any section to control how deep the explanation goes.

  1. What a decoder does. Tokens in, logits out.

  2. Token lookup + sinusoidal positional encoding.

  3. Q/K/V, scores, mask, softmax, output. The headline act.

  4. Position-wise W₂·GELU(W₁·x + b) + b.

  5. Why residuals; why LayerNorm; pre-norm vs post-norm.

  6. What changes layer-to-layer.

  7. Logits → next token. Greedy, temperature, top-k, top-p.