The feed-forward network (FFN) is a component in each transformer layer that processes each token independently through two linear transformations with a non-linear activation in between. It's where much of the model's knowledge is stored.
FFNs provide the computational capacity for transformers to learn complex patterns and store factual knowledge.