Cross-Encoder | HyperKit.ai

Definition

Cross-Encoder is a neural model architecture that processes query-document pairs together through a transformer to produce a relevance score. Unlike bi-encoders that embed query and documents separately, cross-encoders attend to both simultaneously, enabling more accurate relevance judgments.

Cross-encoders are commonly used as rerankers in retrieval pipelines, taking initial candidates from faster methods (BM25, bi-encoder) and reordering them for higher precision.

Key Concepts

Joint encoding: Query and document processed together
Cross-attention: Tokens attend across query and document
Reranking: Reorder initial retrieval results
Accuracy vs speed trade-off: More accurate but slower

Examples

Comparison

Cross-Encoder vs Bi-Encoder

BI-ENCODER (Separate encoding):

Query: "What causes rain?"
     ↓
 [Encoder]
     ↓
Query Vector: [0.2, 0.5, ...]

Document: "Rain forms when water evaporates..."
     ↓
 [Encoder]
     ↓
Doc Vector: [0.3, 0.4, ...]

Score = cosine_similarity(query_vec, doc_vec)

Pros: Fast (pre-compute doc vectors)
Cons: No query-doc interaction during encoding

---

CROSS-ENCODER (Joint encoding):

Input: "[CLS] What causes rain? [SEP] Rain forms
        when water evaporates... [SEP]"
                    ↓
            [Transformer]
            (Full attention between
             query and document tokens)
                    ↓
              Score: 0.92

Pros: Query-doc tokens attend to each other
Cons: Must run model for each pair (slow)

---

VISUAL COMPARISON:

Bi-Encoder:
┌─────────┐   ┌─────────┐
│ Query   │   │ Document│
└────┬────┘   └────┬────┘
     ↓             ↓
┌─────────┐   ┌─────────┐
│Encoder A│   │Encoder B│  (Same or different)
└────┬────┘   └────┬────┘
     ↓             ↓
  [0.2,0.5]    [0.3,0.4]  → cosine → score

Cross-Encoder:
┌────────────────────────────────┐
│ Query [SEP] Document           │
└───────────────┬────────────────┘
                ↓
┌───────────────────────────────┐
│     Single Transformer         │
│   (Cross-attention between     │
│    all tokens)                 │
└───────────────┬───────────────┘
                ↓
            Score: 0.92

PERFORMANCE COMPARISON:
┌─────────────────┬────────────┬───────────────┐
│ Aspect          │ Bi-Encoder │ Cross-Encoder │
├─────────────────┼────────────┼───────────────┤
│ Accuracy        │ Good       │ Excellent     │
│ Speed (1M docs) │ ~50ms      │ ~hours        │
│ Pre-computation │ Yes        │ No            │
│ Use case        │ Retrieval  │ Reranking     │
└─────────────────┴────────────┴───────────────┘

Implementation

Cross-Encoder Reranking Pipeline

RERANKING PIPELINE:

Step 1: Initial Retrieval (fast, recall-focused)
┌─────────────────────────────────────────────┐
│ Query → BM25 / Bi-Encoder → Top 100 docs    │
│                                             │
│ Speed: ~50ms for millions of docs           │
│ Recall: ~90% (relevant docs in top 100)     │
└─────────────────────────────────────────────┘
                    ↓
Step 2: Reranking (slow, precision-focused)
┌─────────────────────────────────────────────┐
│ Cross-Encoder scores each of 100 candidates │
│                                             │
│ Speed: ~500ms for 100 docs                  │
│ Precision: Much improved ordering           │
└─────────────────────────────────────────────┘
                    ↓
Step 3: Return Top K
┌─────────────────────────────────────────────┐
│ Return top 5-10 reranked documents          │
└─────────────────────────────────────────────┘

---

CODE IMPLEMENTATION:

from sentence_transformers import CrossEncoder

# Load reranker model
reranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')

def search_with_rerank(query, k=5, initial_k=100):
    # Step 1: Initial retrieval
    candidates = bm25_search(query, k=initial_k)

    # Step 2: Prepare pairs for cross-encoder
    pairs = [[query, doc.text] for doc in candidates]

    # Step 3: Score with cross-encoder
    scores = reranker.predict(pairs)

    # Step 4: Sort by cross-encoder score
    reranked = sorted(
        zip(candidates, scores),
        key=lambda x: x[1],
        reverse=True
    )

    return [doc for doc, score in reranked[:k]]

---

IMPACT ON RAG QUALITY:

Without reranking:
Query: "How to handle null pointer in Java?"
1. "Null values in databases" (BM25 matched "null")
2. "Java NullPointerException guide" ← relevant
3. "Pointer arithmetic in C" (matched "pointer")

With cross-encoder reranking:
1. "Java NullPointerException guide" ✓
2. "Handling null values in Java" ✓
3. "Null safety best practices" ✓

Cross-encoder understands "null pointer in Java"
as a concept, not just keyword matches.

Interactive Exercise

✎

Design Retrieval Pipeline

You're building a legal document search system. Users need highly accurate results (wrong legal advice is dangerous) but also reasonable speed (under 2 seconds). You have 500K documents. Design the retrieval pipeline.

Pro Tips

Rerank fewer candidates for speed (50-100 typically sufficient)
Distilled cross-encoders (MiniLM) balance speed and accuracy
Cross-encoders can be fine-tuned for domain-specific relevance
Consider GPU batch processing for production reranking

Definition

Key Concepts

Examples

Interactive Exercise

Related Terms