Vector Search | HyperKit.ai

Definition

Vector search finds items similar to a query by comparing their vector representations in a high-dimensional space. It's the core operation powering semantic search, recommendation systems, and RAG pipelines.

Efficient vector search uses approximate nearest neighbor (ANN) algorithms to search billions of vectors in milliseconds.

Key Concepts

Distance metrics: Cosine, Euclidean, dot product
Exact search: Compares all vectors (accurate but slow)
ANN algorithms: HNSW, IVF, PQ for fast approximate search
Index structures: Pre-built data structures for efficient lookup

Examples

Concept

Vector Space Similarity

2D Visualization (actual embeddings are 384-1536 dimensions):

          "dog"
            ●
           /|
          / |  "puppy"
         /  |    ●
        /   |
"cat" ●    |
       \   |
        \  |
         \ |
          \|
           ● "kitten"


Query: "puppy"
Results by distance:
1. "dog" → distance: 0.15 (closest)
2. "kitten" → distance: 0.35
3. "cat" → distance: 0.40

Similar concepts cluster together in vector space!

Implementation

Vector Search with FAISS

import faiss
import numpy as np

class VectorSearchEngine:
    def __init__(self, dimension, use_gpu=False):
        self.dimension = dimension
        # Create index (IVF for large-scale search)
        quantizer = faiss.IndexFlatL2(dimension)
        self.index = faiss.IndexIVFFlat(
            quantizer, dimension, 100  # 100 clusters
        )
        if use_gpu:
            self.index = faiss.index_cpu_to_gpu(
                faiss.StandardGpuResources(), 0, self.index
            )

    def build_index(self, vectors):
        """Train and add vectors to index"""
        vectors = np.array(vectors).astype('float32')
        self.index.train(vectors)
        self.index.add(vectors)

    def search(self, query_vector, k=10):
        """Find k nearest neighbors"""
        query = np.array([query_vector]).astype('float32')
        distances, indices = self.index.search(query, k)
        return [
            {"index": int(idx), "distance": float(dist)}
            for idx, dist in zip(indices[0], distances[0])
        ]

# Search billions of vectors in milliseconds!

Interactive Exercise

✎

Choose the Right Metric

Match each use case to the best distance metric:

Metrics: Cosine, Euclidean, Dot Product

Use Cases:
1. Comparing sentence embeddings
2. Comparing raw image pixel values
3. Maximum inner product search for recommendations

Pro Tips

Normalize vectors if using dot product for cosine-like behavior
HNSW gives best recall/speed tradeoff for most cases
IVF is better for very large datasets (>100M vectors)
Test with your actual data—benchmarks don't always transfer

Definition

Key Concepts

Examples

Interactive Exercise

Related Terms