Cross-Attention is an attention mechanism where queries come from one sequence (e.g., decoder) while keys and values come from a different sequence (e.g., encoder). This enables models to attend to relevant parts of an input when generating output.
Unlike self-attention (where Q, K, V all come from the same sequence), cross-attention bridges two different representations, enabling tasks like translation, image captioning, and retrieval-augmented generation.