Red teaming is the practice of proactively testing AI systems by attempting to elicit harmful, unintended, or unsafe outputs. Red teamers act as adversaries, trying to find vulnerabilities, jailbreaks, and failure modes before malicious actors do.
Red teaming is essential for understanding a model's actual safety boundaries versus its intended behavior.