Reranking | HyperKit.ai

Definition

Reranking is a two-stage retrieval approach where an initial fast retrieval returns candidate documents, which are then reordered by a more accurate but slower model. The reranker examines query-document pairs together for deeper relevance assessment.

This combines the efficiency of first-stage retrieval with the accuracy of cross-attention models.

Key Concepts

Two-stage pipeline: Fast retrieval → accurate reranking
Cross-encoder: Processes query and document together
Bi-encoder vs cross-encoder: Speed vs accuracy tradeoff
Top-k filtering: Only rerank top candidates (e.g., top 100)

Examples

Architecture

Two-Stage Retrieval Pipeline

Query: "How to train a neural network?"

STAGE 1: Fast Retrieval (bi-encoder)
─────────────────────────────────────
Search 1M documents → Return top 100 candidates
Speed: ~10ms | Accuracy: Good

Candidates:
1. "Neural network basics" (score: 0.85)
2. "Deep learning tutorial" (score: 0.82)
3. "Training ML models" (score: 0.80)
...
100. "Network security guide" (score: 0.45)

STAGE 2: Reranking (cross-encoder)
─────────────────────────────────────
Score 100 candidates with query → Reorder
Speed: ~500ms | Accuracy: Excellent

Reranked Results:
1. "Deep learning tutorial" (score: 0.95) ↑ moved up
2. "Training ML models" (score: 0.91) ↑ moved up
3. "Neural network basics" (score: 0.88) ↓ moved down
...
100. "Network security guide" (score: 0.12) ← correctly low

Implementation

Cross-Encoder Reranking

from sentence_transformers import CrossEncoder

class Reranker:
    def __init__(self, model_name="cross-encoder/ms-marco-MiniLM-L-6-v2"):
        self.model = CrossEncoder(model_name)

    def rerank(self, query, documents, top_k=10):
        """
        Rerank documents based on relevance to query.

        Cross-encoder processes [query, doc] pairs together,
        allowing deep interaction between query and document.
        """
        # Create query-document pairs
        pairs = [[query, doc["text"]] for doc in documents]

        # Score all pairs
        scores = self.model.predict(pairs)

        # Combine scores with documents
        for doc, score in zip(documents, scores):
            doc["rerank_score"] = float(score)

        # Sort by reranked score
        reranked = sorted(
            documents,
            key=lambda x: x["rerank_score"],
            reverse=True
        )

        return reranked[:top_k]

# Usage in RAG pipeline:
candidates = vector_search(query, k=100)  # Fast
results = reranker.rerank(query, candidates, top_k=5)  # Accurate

Interactive Exercise

✎

Design Reranking Pipeline

You have 10 million documents and need sub-second latency. Design your retrieval + reranking pipeline:

- How many candidates for first stage?
- How many to rerank?
- How many final results?

Pro Tips

Rerank 50-200 candidates for good accuracy/speed balance
Cross-encoders are ~100x slower but significantly more accurate
Consider LLM-based reranking for complex queries
Cache reranking results for repeated queries

Definition

Key Concepts

Examples

Interactive Exercise

Related Terms