Alignment refers to training AI systems to behave in accordance with human intentions, values, and preferences. An aligned model is helpful, harmless, and honest - doing what users want while avoiding harmful outputs.
Alignment is one of the most important challenges in AI safety, ensuring powerful models remain beneficial and under human control.