Model Internals / Representations

Embedding

Beginner [2/5]
Vector representation Dense representation Neural embedding

Definition

An embedding is a dense vector representation that captures the meaning of discrete objects (like words, tokens, or entities) in a continuous vector space. Similar items have similar embeddings, enabling mathematical operations on meaning.

Embeddings are the foundation of how neural networks understand and process language.

Key Concepts

  • Dense vectors: Compact representations (e.g., 768 dimensions)
  • Learned: Embeddings are trained to capture useful patterns
  • Semantic similarity: Similar meanings → nearby vectors
  • Operations: Can add, subtract, average embeddings

Examples

Concept
One-Hot vs Embedding
VOCABULARY: [cat, dog, fish, bird, car, bus] ONE-HOT ENCODING (sparse): cat → [1, 0, 0, 0, 0, 0] dog → [0, 1, 0, 0, 0, 0] fish → [0, 0, 1, 0, 0, 0] Problems: - No similarity info (cat-dog same as cat-car) - Huge vectors for large vocabularies - No generalization between words EMBEDDING (dense): cat → [0.2, -0.5, 0.8, 0.1] dog → [0.3, -0.4, 0.7, 0.2] ← Similar to cat! fish → [0.1, 0.6, -0.3, 0.4] ← Different car → [-0.8, 0.1, 0.2, -0.6] ← Very different Benefits: - Captures similarity (cat ≈ dog) - Compact (4 dims vs 50,000+) - Generalizes: new words get meaningful vectors
Famous Example
Word Arithmetic
SEMANTIC ARITHMETIC (word2vec discovery): king - man + woman ≈ queen Vector space captures relationships: - king and man are related (male royalty) - queen and woman are related (female royalty) - king - man = "royalty" concept - "royalty" + woman = queen MORE EXAMPLES: Paris - France + Italy ≈ Rome walked - walk + swim ≈ swam bigger - big + small ≈ smaller # Python code from gensim.models import KeyedVectors model = KeyedVectors.load_word2vec_format('vectors.bin') result = model.most_similar( positive=['king', 'woman'], negative=['man'] ) # Returns: [('queen', 0.89), ...] This works because embeddings encode relationships as directions in vector space!

Interactive Exercise

Predict Similarity

Rank these word pairs by embedding similarity (1 = most similar):

A. happy - sad
B. happy - joyful
C. happy - computer

Pro Tips
  • Embeddings are the first layer of most NLP models
  • Pre-trained embeddings (Word2Vec, GloVe) jump-start training
  • Contextual embeddings (from transformers) vary by context
  • Visualize with t-SNE or UMAP to see clustering

Related Terms