/learn
The decoder, step by step
Read in order — each section builds on the one before it. Toggle layers (Concept / Maths / Code) inside any section to control how deep the explanation goes.
- 01Overview
What a decoder does. Tokens in, logits out.
Token lookup + sinusoidal positional encoding.
Q/K/V, scores, mask, softmax, output. The headline act.
Position-wise W₂·GELU(W₁·x + b) + b.
Why residuals; why LayerNorm; pre-norm vs post-norm.
What changes layer-to-layer.
- 07Sampling
Logits → next token. Greedy, temperature, top-k, top-p.