Are embeddings the same as vectors?

An embedding is a vector, but not every vector is an embedding. A vector is just an ordered list of numbers - coordinates, an RGB color, a spreadsheet row could all be a vector. An embedding is a vector specifically produced by a model so that distance between vectors reflects similarity in meaning.

What model creates embeddings?

A dedicated embedding model, usually a smaller, specialized model trained to turn text into vectors rather than to generate text. It runs alongside (not instead of) the language model that writes the final answer.

Do embeddings capture meaning perfectly?

No. They capture approximate similarity learned from training data, which is very useful but not infallible - two passages can be conceptually related yet sit further apart than expected, or share surface wording while meaning different things. That's why retrieval is usually paired with reranking rather than trusted on vector distance alone.

Vector embedding - Definition

A vector embedding is a list of numbers that represents a piece of text in a way that captures its meaning, so that passages with similar meanings end up close together and unrelated ones end up far apart.

What it actually is

A model reads a sentence and outputs an array of numbers - often hundreds or thousands of them - called a vector. You can think of each vector as coordinates for a point in a very high-dimensional space. The key property is that distance in that space reflects meaning, not wording: "cancel my subscription" and "how do I end my plan" land near each other even though they share almost no words, while "cancel my subscription" and "renew for another year" sit far apart despite looking similar on the surface. That is what lets software compare ideas rather than just match keywords.

Why it matters

Keyword search fails the moment the reader and the document use different words for the same thing. Embeddings fix this. By turning both the question and every passage into vectors, a system can rank passages by closeness in meaning - the basis of semantic search.

Meaning space explained

It helps to picture the space embeddings live in, even though no one can actually visualize hundreds of dimensions at once. Imagine every possible sentence as a point somewhere in a vast space, positioned so that sentences meaning roughly the same thing cluster together and sentences meaning different things sit apart - the way a map places nearby towns close together and distant ones far apart, except the "distance" here is conceptual closeness rather than geography.

An embedding model's job is to learn where each piece of text belongs in that space. It's trained on huge amounts of text so that, for example, "refund policy," "money-back guarantee," and "getting your money back" all land in roughly the same neighborhood, while "refund policy" and "shipping address" end up nowhere near each other - even though "refund" and "shipping" are both perfectly ordinary words that could appear in the same document. The model isn't matching letters or word roots; it has learned, from patterns across enormous amounts of text, which ideas tend to occur in similar contexts, and it encodes that learned closeness directly as position in the space.

This is why embeddings generalize so well. A model doesn't need to have seen the exact sentence "how do I get my money back" during training - it only needs to have learned that this kind of phrasing tends to mean the same thing as other refund-related language, and it will still place it in the right neighborhood of meaning space.

Embeddings vs vectors

The words "embedding" and "vector" get used almost interchangeably in casual conversation, and that's understandable - every embedding is a vector. But not every vector is an embedding, and the distinction is worth being precise about.

A vector, mathematically, is nothing more than an ordered list of numbers. Plenty of ordinary things are vectors: a GPS coordinate, an RGB color value, a row in a spreadsheet, three numbers describing a product's height, width and weight. None of those carry any notion of "meaning" - they're just structured data, and distance between two of them measures a literal numeric difference, not conceptual similarity.

An embedding is a specific, narrower thing: a vector that a trained model produces specifically so that distance reflects semantic similarity. The numbers themselves are not individually meaningful - you can't point at one dimension and say "this is the sadness score" - but their arrangement relative to other embeddings is what encodes meaning. That property doesn't come for free; it's the entire point of training the model, and it's what separates an embedding from an arbitrary list of numbers that happens to be the same length.

	Embedding	Raw / plain vector
What it is	A vector produced by a trained model so distance reflects meaning	Any ordered list of numbers
Where it comes from	An embedding model trained on large amounts of text (or images, audio, etc.)	Anywhere - sensor readings, coordinates, colors, manually defined features
Does distance mean similarity?	Yes, by design - that's what the model was trained to produce	Only if the numbers happen to represent something where numeric distance is meaningful
Typical length	Hundreds to thousands of dimensions	Any length - often just 2–5 for everyday data (like GPS or RGB)
Used for	Semantic search, retrieval, clustering by meaning	Whatever the raw numbers represent - plotting, physical measurement, direct calculation

The practical upshot: when someone in an AI context says "vector," they almost always mean "embedding" - the two terms have converged in that specific domain, even though "vector" is the broader mathematical concept and "embedding" is the trained, meaning-aware version of it that retrieval systems actually rely on.

How it underpins retrieval

Embeddings are the engine behind retrieval-augmented generation:

Each chunk of a document is embedded once and stored.
Your question is embedded at query time.
The system finds the passages whose vectors are nearest to the question's vector and feeds those to the model as context.

This is how an assistant answers from a 90-page PDF without reading all 90 pages every time.

When you ask Sidenote about a document, embeddings help locate the exact passages that bear on your question, so the answer is built from real text in front of you - and every claim is backed by a citation that scrolls to its source.

What it actually is

Why it matters

Meaning space explained

Embeddings vs vectors

How it underpins retrieval

Stop digging. Start asking.