Embedding Model | HyperKit.ai

Definition

Embedding models convert text (or other data) into dense vector representations that capture semantic meaning. These vectors enable similarity comparisons, clustering, and retrieval—the foundation of modern RAG systems.

Unlike generative LLMs that produce text, embedding models produce fixed-length vectors optimized for comparison tasks.

Key Concepts

Vector dimensions: Output size (384 to 4096 typically)
Semantic similarity: Similar meanings → close vectors
Bi-encoder: Encodes query and doc independently
Fine-tuning: Adapting to specific domains

Examples

Popular Models

Embedding Model Comparison

Model                      Dimensions  Speed   Quality
──────────────────────────────────────────────────────
OpenAI text-embedding-3-small    1536   Fast    Good
OpenAI text-embedding-3-large    3072   Medium  Excellent
Cohere embed-v3                  1024   Fast    Excellent
Voyage voyage-2                  1024   Medium  Excellent
BGE bge-large-en-v1.5           1024   Fast    Very Good
E5 e5-large-v2                  1024   Fast    Very Good
all-MiniLM-L6-v2                 384   V.Fast  Good
──────────────────────────────────────────────────────

Selection Criteria:
- Speed: MiniLM for low latency
- Quality: Cohere/Voyage for best accuracy
- Balance: BGE/E5 for good tradeoff
- API: OpenAI for simplicity

Implementation

Using Embedding Models

from sentence_transformers import SentenceTransformer
from openai import OpenAI

# Option 1: Local model (free, private)
local_model = SentenceTransformer('all-MiniLM-L6-v2')
embeddings = local_model.encode([
    "Machine learning basics",
    "Introduction to ML",
    "Cooking recipes"
])
# Shape: (3, 384)

# Option 2: OpenAI API (higher quality)
client = OpenAI()
response = client.embeddings.create(
    model="text-embedding-3-small",
    input=["Machine learning basics"]
)
embedding = response.data[0].embedding
# Shape: (1536,)

# Compare similarity
from sklearn.metrics.pairwise import cosine_similarity
similarities = cosine_similarity(embeddings)
# [[1.0, 0.89, 0.12],   ← ML texts are similar
#  [0.89, 1.0, 0.15],   ← to each other
#  [0.12, 0.15, 1.0]]   ← cooking is different

Interactive Exercise

✎

Choose an Embedding Model

For each scenario, which embedding model would you choose?

1. Processing 10M documents on a budget
2. Legal document search requiring high accuracy
3. Real-time chat with <50ms latency requirement

Pro Tips

Always evaluate on YOUR data—benchmarks don't transfer perfectly
Consider fine-tuning for specialized domains (legal, medical)
Smaller dimensions = faster search but potentially less nuance
Batch embedding calls for efficiency with API models

Definition

Key Concepts

Examples

Interactive Exercise

Related Terms