Information Retrieval / Neural Search

Cross-Encoder

Advanced [4/5]
Reranker Cross-attention model Pair classifier

Definition

Cross-Encoder is a neural model architecture that processes query-document pairs together through a transformer to produce a relevance score. Unlike bi-encoders that embed query and documents separately, cross-encoders attend to both simultaneously, enabling more accurate relevance judgments.

Cross-encoders are commonly used as rerankers in retrieval pipelines, taking initial candidates from faster methods (BM25, bi-encoder) and reordering them for higher precision.

Key Concepts

  • Joint encoding: Query and document processed together
  • Cross-attention: Tokens attend across query and document
  • Reranking: Reorder initial retrieval results
  • Accuracy vs speed trade-off: More accurate but slower

Examples

Comparison
Cross-Encoder vs Bi-Encoder
BI-ENCODER (Separate encoding): Query: "What causes rain?" ↓ [Encoder] ↓ Query Vector: [0.2, 0.5, ...] Document: "Rain forms when water evaporates..." ↓ [Encoder] ↓ Doc Vector: [0.3, 0.4, ...] Score = cosine_similarity(query_vec, doc_vec) Pros: Fast (pre-compute doc vectors) Cons: No query-doc interaction during encoding --- CROSS-ENCODER (Joint encoding): Input: "[CLS] What causes rain? [SEP] Rain forms when water evaporates... [SEP]" ↓ [Transformer] (Full attention between query and document tokens) ↓ Score: 0.92 Pros: Query-doc tokens attend to each other Cons: Must run model for each pair (slow) --- VISUAL COMPARISON: Bi-Encoder: ┌─────────┐ ┌─────────┐ │ Query │ │ Document│ └────┬────┘ └────┬────┘ ↓ ↓ ┌─────────┐ ┌─────────┐ │Encoder A│ │Encoder B│ (Same or different) └────┬────┘ └────┬────┘ ↓ ↓ [0.2,0.5] [0.3,0.4] → cosine → score Cross-Encoder: ┌────────────────────────────────┐ │ Query [SEP] Document │ └───────────────┬────────────────┘ ↓ ┌───────────────────────────────┐ │ Single Transformer │ │ (Cross-attention between │ │ all tokens) │ └───────────────┬───────────────┘ ↓ Score: 0.92 PERFORMANCE COMPARISON: ┌─────────────────┬────────────┬───────────────┐ │ Aspect │ Bi-Encoder │ Cross-Encoder │ ├─────────────────┼────────────┼───────────────┤ │ Accuracy │ Good │ Excellent │ │ Speed (1M docs) │ ~50ms │ ~hours │ │ Pre-computation │ Yes │ No │ │ Use case │ Retrieval │ Reranking │ └─────────────────┴────────────┴───────────────┘
Implementation
Cross-Encoder Reranking Pipeline
RERANKING PIPELINE: Step 1: Initial Retrieval (fast, recall-focused) ┌─────────────────────────────────────────────┐ │ Query → BM25 / Bi-Encoder → Top 100 docs │ │ │ │ Speed: ~50ms for millions of docs │ │ Recall: ~90% (relevant docs in top 100) │ └─────────────────────────────────────────────┘ ↓ Step 2: Reranking (slow, precision-focused) ┌─────────────────────────────────────────────┐ │ Cross-Encoder scores each of 100 candidates │ │ │ │ Speed: ~500ms for 100 docs │ │ Precision: Much improved ordering │ └─────────────────────────────────────────────┘ ↓ Step 3: Return Top K ┌─────────────────────────────────────────────┐ │ Return top 5-10 reranked documents │ └─────────────────────────────────────────────┘ --- CODE IMPLEMENTATION: from sentence_transformers import CrossEncoder # Load reranker model reranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2') def search_with_rerank(query, k=5, initial_k=100): # Step 1: Initial retrieval candidates = bm25_search(query, k=initial_k) # Step 2: Prepare pairs for cross-encoder pairs = [[query, doc.text] for doc in candidates] # Step 3: Score with cross-encoder scores = reranker.predict(pairs) # Step 4: Sort by cross-encoder score reranked = sorted( zip(candidates, scores), key=lambda x: x[1], reverse=True ) return [doc for doc, score in reranked[:k]] --- IMPACT ON RAG QUALITY: Without reranking: Query: "How to handle null pointer in Java?" 1. "Null values in databases" (BM25 matched "null") 2. "Java NullPointerException guide" ← relevant 3. "Pointer arithmetic in C" (matched "pointer") With cross-encoder reranking: 1. "Java NullPointerException guide" ✓ 2. "Handling null values in Java" ✓ 3. "Null safety best practices" ✓ Cross-encoder understands "null pointer in Java" as a concept, not just keyword matches.

Interactive Exercise

Design Retrieval Pipeline

You're building a legal document search system. Users need highly accurate results (wrong legal advice is dangerous) but also reasonable speed (under 2 seconds). You have 500K documents. Design the retrieval pipeline.

Pro Tips
  • Rerank fewer candidates for speed (50-100 typically sufficient)
  • Distilled cross-encoders (MiniLM) balance speed and accuracy
  • Cross-encoders can be fine-tuned for domain-specific relevance
  • Consider GPU batch processing for production reranking

Related Terms