Self-attention is attention applied within a single sequence—each position attends to all other positions in the same sequence. Unlike cross-attention (between encoder and decoder), self-attention helps the model understand relationships within the input text itself.
This allows every token to directly consider every other token, regardless of distance, capturing global context.