Skip to content
Glossary

Embedding

An embedding is a numeric vector that represents the meaning of text, images, or other data in a high-dimensional space. Items with similar meaning produce vectors that sit close together, which lets systems compare, cluster, and retrieve content by semantic similarity rather than exact matches.

Synonyms: vector embedding, text embedding, semantic vector, dense representation

Embeddings are the bridge between human language and similarity math. An embedding model maps each input to a fixed-length vector so that semantically related items cluster together, enabling vector search, clustering, classification, and deduplication. In a retrieval pipeline, both the indexed chunks and the incoming query are embedded with the same model so distances are meaningful. Because the embedding model defines the space, its version is metadata worth tracking for reproducibility and controlled reindexing.

Frequently asked questions

Why does the embedding model version matter?
Vectors from different models are not comparable. Storing the model version with each embedding lets you detect drift and reindex safely when you upgrade the embedding model.
Are embeddings reversible back to the original text?
Not exactly, but embeddings can leak sensitive information, so they should inherit the same tenant isolation and access controls as the source content they represent.