Inference is the process of using a trained model to generate outputs from inputs. When you send a prompt to an LLM and get a response, the model is performing inference. It's the "using" phase as opposed to the "training" phase.
Inference happens every time you interact with ChatGPT, Claude, or any AI assistant—the model applies what it learned during training to your specific request.