Memory and Context / Retrieval Methods

Reranking

Advanced [4/5]
Cross-encoder reranking Second-stage ranking Re-scoring

Definition

Reranking is a two-stage retrieval approach where an initial fast retrieval returns candidate documents, which are then reordered by a more accurate but slower model. The reranker examines query-document pairs together for deeper relevance assessment.

This combines the efficiency of first-stage retrieval with the accuracy of cross-attention models.

Key Concepts

  • Two-stage pipeline: Fast retrieval → accurate reranking
  • Cross-encoder: Processes query and document together
  • Bi-encoder vs cross-encoder: Speed vs accuracy tradeoff
  • Top-k filtering: Only rerank top candidates (e.g., top 100)

Examples

Architecture
Two-Stage Retrieval Pipeline
Query: "How to train a neural network?" STAGE 1: Fast Retrieval (bi-encoder) ───────────────────────────────────── Search 1M documents → Return top 100 candidates Speed: ~10ms | Accuracy: Good Candidates: 1. "Neural network basics" (score: 0.85) 2. "Deep learning tutorial" (score: 0.82) 3. "Training ML models" (score: 0.80) ... 100. "Network security guide" (score: 0.45) STAGE 2: Reranking (cross-encoder) ───────────────────────────────────── Score 100 candidates with query → Reorder Speed: ~500ms | Accuracy: Excellent Reranked Results: 1. "Deep learning tutorial" (score: 0.95) ↑ moved up 2. "Training ML models" (score: 0.91) ↑ moved up 3. "Neural network basics" (score: 0.88) ↓ moved down ... 100. "Network security guide" (score: 0.12) ← correctly low
Implementation
Cross-Encoder Reranking
from sentence_transformers import CrossEncoder class Reranker: def __init__(self, model_name="cross-encoder/ms-marco-MiniLM-L-6-v2"): self.model = CrossEncoder(model_name) def rerank(self, query, documents, top_k=10): """ Rerank documents based on relevance to query. Cross-encoder processes [query, doc] pairs together, allowing deep interaction between query and document. """ # Create query-document pairs pairs = [[query, doc["text"]] for doc in documents] # Score all pairs scores = self.model.predict(pairs) # Combine scores with documents for doc, score in zip(documents, scores): doc["rerank_score"] = float(score) # Sort by reranked score reranked = sorted( documents, key=lambda x: x["rerank_score"], reverse=True ) return reranked[:top_k] # Usage in RAG pipeline: candidates = vector_search(query, k=100) # Fast results = reranker.rerank(query, candidates, top_k=5) # Accurate

Interactive Exercise

Design Reranking Pipeline

You have 10 million documents and need sub-second latency. Design your retrieval + reranking pipeline:

- How many candidates for first stage?
- How many to rerank?
- How many final results?

Pro Tips
  • Rerank 50-200 candidates for good accuracy/speed balance
  • Cross-encoders are ~100x slower but significantly more accurate
  • Consider LLM-based reranking for complex queries
  • Cache reranking results for repeated queries

Related Terms