What is the role of vectors in an LLM?
Audio playback is not available in this browser.
Client :
Between my prompt and the LLM answer, what exactly happens? And why does everyone talk about vectors?
Me :
Great question.
The key idea is simple: an LLM does not manipulate words the way we do. It manipulates numbers.
Those numbers are organized as vectors.
A vector here is a list of values that represents a token (part of a word), a sentence, or even a full document.
What happens between your prompt and the answer
When you send a prompt, the model follows a sequence of steps:
- Tokenization
The text is split into units (words, subwords, punctuation). - Vector conversion
Each token is converted into a numeric vector (embedding). - Attention calculation
The model compares tokens to identify what matters most in context. - Next-token prediction
It computes probabilities and selects the most likely token (or a nearby one, depending on temperature). - Generation loop
The generated token is added to context, then the process repeats until the full answer is produced.
In short, the answer is built token by token, not in a single shot.
Why vectors are central
Vectors let the model capture semantic proximity.
For example, in vector space:
- “invoice” is close to “payment”,
- “contract” is close to “clause”,
- “delay” is closer to “deadline” than to “marketing”.
This geometry helps the model stay coherent even when phrasing changes.
A business example
Imagine you write:
Draft a client response about a delivery delay, reassuring tone, max 5 lines.
The model will:
- transform that instruction into vectors,
- understand the constraints (delay, reassuring tone, length),
- generate a sentence,
- check at each token whether the continuation is consistent with context.
This is not magic. It is a sequence of probability computations in vector space.
To make this even more concrete, here is a minimal example.
A simple breakdown: text, tokens, vectors
Take this sentence:
The client asks for a quick quote.
One possible tokenization:
Theclientasksforaquickquote.
Depending on the tokenizer, some words can also be split (example: qu + ick).
Then each token is transformed into a numeric vector.
Simplified example (4 dimensions only, for illustration):
The->[0.12, -0.44, 0.08, 0.31]client->[0.91, 0.15, -0.22, 0.07]asks->[0.55, -0.11, 0.49, -0.03]quote->[0.88, 0.34, -0.09, 0.12]quick->[0.41, -0.05, 0.77, -0.21]
In real models, vectors often have hundreds or thousands of dimensions, but the logic is the same.
Vector to matrix computation: what happens next
A vector alone is not enough.
To produce useful outputs, the model applies matrix operations to these vectors.
In very simple terms:
- a token is represented by a vector
x, - this vector is multiplied by a weight matrix
W, - you get a new vector
ycontaining transformed information.
We can write it as:
y = W x
Why it matters:
these multiplications allow the model to:
- mix information,
- amplify useful signals,
- build better internal representations layer after layer.
In an LLM, this logic is repeated many times (with multiple matrices), which is how the model goes from raw text to a relevant next-token prediction.
And in a RAG system, where do vectors fit?
In RAG systems, internal documents are also converted into vectors.
When you ask a question, the system searches for vectors closest to your prompt to retrieve the right passages.
Those passages are then injected into model context before generation.
So:
- vectors are used to retrieve the right information (RAG),
- then used to generate the right formulation (LLM).
What I recommend you keep in mind
If you are a non-technical executive, keep these 3 ideas:
- An LLM first works like a machine that predicts using probabilities.
- Vectors are the internal language used to represent meaning.
- Output quality depends as much on provided context (prompt, documents, constraints) as on the model itself.
The better you frame the input, the more useful, reliable, and actionable the output becomes.