Concept
After attention has mixed information across positions, the feed-forward network (FFN) processes each position on its own. It's a small two-layer MLP, applied identically at every sequence position.
You can think of attention as the who-talks-to-whom sub-layer, and the FFN as the what-each-position-thinks sub-layer. Most of the parameters in a Transformer live here — typically four times as many as in attention — and most of the model's "knowledge" sits in these matrices.
The network has three steps:
- Up-projection to a wider dimension
d_ff(hered_ff = 2 · d_model). - GELU non-linearity, which lets the FFN model curved decision surfaces. Without it the whole model would collapse to a single linear layer.
- Down-projection back to
d_model, producing the contribution that gets added to the residual stream.
Click any position below to see its three vectors light up. The bars to the right of centre are positive values, to the left are negative. The pre-activation and after-GELU charts share width — notice how GELU softly zeroes most of the bars.
Maths
For one position with input x (a d_model-dimensional vector):
pre = x · W1 + b1 // [d_ff]
act = GELU(pre) // [d_ff]
out = act · W2 + b2 // [d_model]
We use the tanh approximation of GELU (the variant GPT-2 uses):
GELU(x) ≈ 0.5 · x · (1 + tanh(√(2/π) · (x + 0.044715 · x³)))
The same W1, b1, W2, b2 are applied at every sequence position
independently — that's why we say "position-wise". Mathematically this
is identical to a 1-D convolution with a kernel size of 1.
Code
// src/lib/transformer/ffn.ts (excerpt)
export function ffn(x: Matrix, w: FFNWeights, trace?: FFNTrace): Matrix {
const pre = addRowBias(matmul(x, w.W1), w.b1);
const act = geluMat(pre);
const out = addRowBias(matmul(act, w.W2), w.b2);
if (trace) {
trace.pre = pre;
trace.act = act;
trace.out = out;
}
return out;
}
The whole FFN is three named lines. The widget above hits
/api/compute/ffn, which calls this function with the same weights the
attention page used.
Comments
Be the first to leave a comment on this section.
Sign in (top-right) to leave a comment.