Memory and Context / Core Components

Embedding Model

Intermediate [3/5]
Encoder model Text encoder Sentence transformer

Definition

Embedding models convert text (or other data) into dense vector representations that capture semantic meaning. These vectors enable similarity comparisons, clustering, and retrieval—the foundation of modern RAG systems.

Unlike generative LLMs that produce text, embedding models produce fixed-length vectors optimized for comparison tasks.

Key Concepts

  • Vector dimensions: Output size (384 to 4096 typically)
  • Semantic similarity: Similar meanings → close vectors
  • Bi-encoder: Encodes query and doc independently
  • Fine-tuning: Adapting to specific domains

Examples

Popular Models
Embedding Model Comparison
Model Dimensions Speed Quality ────────────────────────────────────────────────────── OpenAI text-embedding-3-small 1536 Fast Good OpenAI text-embedding-3-large 3072 Medium Excellent Cohere embed-v3 1024 Fast Excellent Voyage voyage-2 1024 Medium Excellent BGE bge-large-en-v1.5 1024 Fast Very Good E5 e5-large-v2 1024 Fast Very Good all-MiniLM-L6-v2 384 V.Fast Good ────────────────────────────────────────────────────── Selection Criteria: - Speed: MiniLM for low latency - Quality: Cohere/Voyage for best accuracy - Balance: BGE/E5 for good tradeoff - API: OpenAI for simplicity
Implementation
Using Embedding Models
from sentence_transformers import SentenceTransformer from openai import OpenAI # Option 1: Local model (free, private) local_model = SentenceTransformer('all-MiniLM-L6-v2') embeddings = local_model.encode([ "Machine learning basics", "Introduction to ML", "Cooking recipes" ]) # Shape: (3, 384) # Option 2: OpenAI API (higher quality) client = OpenAI() response = client.embeddings.create( model="text-embedding-3-small", input=["Machine learning basics"] ) embedding = response.data[0].embedding # Shape: (1536,) # Compare similarity from sklearn.metrics.pairwise import cosine_similarity similarities = cosine_similarity(embeddings) # [[1.0, 0.89, 0.12], ← ML texts are similar # [0.89, 1.0, 0.15], ← to each other # [0.12, 0.15, 1.0]] ← cooking is different

Interactive Exercise

Choose an Embedding Model

For each scenario, which embedding model would you choose?

1. Processing 10M documents on a budget
2. Legal document search requiring high accuracy
3. Real-time chat with <50ms latency requirement

Pro Tips
  • Always evaluate on YOUR data—benchmarks don't transfer perfectly
  • Consider fine-tuning for specialized domains (legal, medical)
  • Smaller dimensions = faster search but potentially less nuance
  • Batch embedding calls for efficiency with API models

Related Terms