Working memory in LLM systems refers to the information actively held in the context window during a conversation or task. Like human working memory, it has limited capacity but provides immediate access to recent information.
This includes the conversation history, any retrieved documents, and intermediate reasoning steps—everything the model can "see" when generating its next response.