Inference / Generation Control

Stop Sequences

Foundational [2/5]
Stop tokens Stop strings End sequences

Definition

Stop sequences are specific strings or tokens that signal the model to stop generating text when encountered. They provide explicit control over where generation should end, beyond the default end-of-sequence token.

Stop sequences are essential for structured output, preventing the model from continuing past the desired response boundary.

Key Concepts

  • Multiple stops: Can specify several stop sequences
  • String matching: Generation stops when any stop string is produced
  • Not included: Stop sequence typically excluded from output
  • EOS token: Built-in end token is implicit stop sequence

Examples

Use Cases
Common Stop Sequence Patterns
CHAT/CONVERSATION: Stop when the AI would start a new turn stop=["Human:", "User:", "\n\nHuman"] Without stop: "AI: Hello! How can I help? Human: Thanks! AI: You're welcome! Human: ..." ← model generates user turns! With stop (stops at "Human:"): "AI: Hello! How can I help?" ← clean response CODE GENERATION: Stop at function boundaries stop=["def ", "class ", "```"] Prompt: "Write a function to add numbers" Output: "def add(a, b):\n return a + b" (stops before generating another function) JSON EXTRACTION: Stop at closing brace stop=["}"] Output: {"name": "John", "age": 30} (stops after valid JSON) STRUCTURED OUTPUT: Stop at section markers stop=["---", "###", "END"] Q&A FORMAT: stop=["Question:", "Q:"] Ensures only one answer generated
Implementation
Using Stop Sequences
OPENAI API: response = openai.chat.completions.create( model="gpt-4", messages=[ {"role": "user", "content": prompt} ], stop=["Human:", "\n\n---", "END"], max_tokens=500 ) ANTHROPIC API: response = anthropic.messages.create( model="claude-3-opus-20240229", messages=[...], stop_sequences=["Human:", "\n\nUser:"], max_tokens=500 ) HUGGINGFACE: # Using eos_token_id for single token output = model.generate( input_ids, eos_token_id=tokenizer.encode("\n")[0] ) # For string stops, check during generation # or use stopping_criteria PRACTICAL PATTERNS: # ReAct agent - stop after action stop=["Observation:"] # JSON mode stop=["}"] # simple stop=["}\n"] # with newline # Code blocks stop=["```"] # Numbered lists (stop after one item) stop=["2.", "2)"] # Conversation stop=["\nUser:", "\nHuman:", "\n\n"] GOTCHAS: - Stop sequences are CASE SENSITIVE - Whitespace matters! "\n\n" ≠ "\n \n" - Test thoroughly with edge cases - Some APIs limit number of stop sequences

Interactive Exercise

Design Stop Sequences

You want to generate exactly ONE paragraph of text. What stop sequences would you use?

Pro Tips
  • Test stop sequences with actual model output, not assumptions
  • Include newline variations: "\n\n", "\n \n", "\r\n\r\n"
  • For agents, stop at "Observation:" to allow tool execution
  • Combine with max_tokens as a backup limit

Related Terms