Attention is a mechanism that allows a model to focus on relevant parts of the input when producing each part of the output. It computes weighted combinations of input representations, where weights indicate the "relevance" or "attention" given to each input element.
Attention is the core innovation behind transformers and modern LLMs, enabling them to capture long-range dependencies in text.