Retrieval & Augmentation Systems / Embeddings & Search

Semantic Search

Intermediate [3/5]
Vector search Neural search Meaning-based search

Definition

Semantic search finds results based on meaning rather than exact keyword matches. It uses vector embeddings to understand the intent behind a query and match it with semantically similar content, even if the words are completely different.

This is what powers modern RAG systems and makes AI assistants able to find relevant information even when you don't know the exact terms.

Key Concepts

  • Meaning over keywords: Understands concepts, not just words
  • Similarity scoring: Returns results ranked by relevance
  • Embedding-powered: Uses vector representations of text
  • Language agnostic: Can match across languages

Examples

Keyword vs Semantic
Search Comparison
Query: "How do I fix a broken heart?" KEYWORD SEARCH (traditional): → Matches documents containing "fix", "broken", "heart" → Results: Cardiology articles, home repair guides → ❌ Misses the emotional meaning SEMANTIC SEARCH: → Understands: emotional healing, breakup recovery → Results: Relationship advice, mental health resources → ✓ Captures the actual intent
Semantic search understands context and intent, not just individual words.
How It Works
Semantic Search Pipeline
1. INDEX TIME (preparation) Documents → Chunks → Embeddings → Vector DB "Our refund policy allows returns within 30 days" ↓ [0.234, -0.567, 0.891, ...] ↓ Stored in vector database 2. QUERY TIME (search) User query → Embedding → Similarity search "Can I get my money back?" ↓ [0.241, -0.559, 0.887, ...] ↓ Find similar vectors ↓ Return: refund policy document (0.94 similarity)
Both queries and documents are converted to the same embedding space for comparison.
Hybrid Search
Best of Both Worlds
SEMANTIC ONLY - Limitations: • May miss exact matches (model numbers, IDs) • Can be confused by technical jargon • Struggles with very short queries KEYWORD ONLY - Limitations: • Misses synonyms and paraphrases • Requires exact word matches • No understanding of intent HYBRID APPROACH: 1. Run both semantic AND keyword search 2. Combine and re-rank results 3. Get benefits of both Score = (α × semantic_score) + (β × keyword_score)
Many production systems use hybrid search for the best results.

Interactive Exercise

🔍
Predict Search Results

For each query, predict which document a semantic search would return as the top match:

Query: "My laptop won't turn on"

Documents:

  • A: "Troubleshooting computer power issues"
  • B: "How to turn on dark mode"
  • C: "Laptop carrying cases on sale"

Which would rank highest and why?

Pro Tips
  • Semantic search works best with natural language queries
  • Consider hybrid search for production systems
  • Test with real user queries, not just ideal examples
  • Re-ranking can significantly improve final results

Related Terms